Siegmund  Brandt 


Fourth  Edition 


EXTRAS  ONLINE 


Data  Analysis 


Determination  of  mean  foot  length 

Woodcut  from  Jacob  Kobel's  “Geometer  published  1575  in  Frankfurt 


Siegmund  Brandt 


Data  Analysis 


Statistical  and  Computational  Methods 
for  Scientists  and  Engineers 


Fourth  Edition 


Translated  by  Glen  Cowan 


Springer 


Siegmund  Brandt 
Department  of  Physics 
University  of  Siegen 
Siegen,  Germany 


Additional  material  to  this  book  can  be  downloaded  from  http://extras.springer.com 

ISBN  978-3-319-03761-5  ISBN  978-3-319-03762-2  (eBook) 

DOI  10.1007/978-3-319-03762-2 

Springer  Cham  Heidelberg  New  York  Dordrecht  London 
Library  of  Congress  Control  Number:  2013957143 
©  Springer  International  Publishing  Switzerland  2014 

This  work  is  subject  to  copyright.  All  rights  are  reserved  by  the  Publisher,  whether  the  whole  or  part  of  the  ma¬ 
terial  is  concerned,  specifically  the  rights  of  translation,  reprinting,  reuse  of  illustrations,  recitation,  broadcasting, 
reproduction  on  microfilms  or  in  any  other  physical  way,  and  transmission  or  information  storage  and  retrieval,  elec¬ 
tronic  adaptation,  computer  software,  or  by  similar  or  dissimilar  methodology  now  known  or  hereafter  developed. 
Exempted  from  this  legal  reservation  are  brief  excerpts  in  connection  with  reviews  or  scholarly  analysis  or  material 
supplied  specifically  for  the  purpose  of  being  entered  and  executed  on  a  computer  system,  for  exclusive  use  by  the 
purchaser  of  the  work.  Duplication  of  this  publication  or  parts  thereof  is  permitted  only  under  the  provisions  of 
the  Copyright  Law  of  the  Publisher’s  location,  in  its  current  version,  and  permission  for  use  must  always  be  ob¬ 
tained  from  Springer.  Permissions  for  use  may  be  obtained  through  RightsLink  at  the  Copyright  Clearance  Center. 
Violations  are  liable  to  prosecution  under  the  respective  Copyright  Law. 

The  use  of  general  descriptive  names,  registered  names,  trademarks,  service  marks,  etc.  in  this  publication  does  not 
imply,  even  in  the  absence  of  a  specific  statement,  that  such  names  are  exempt  from  the  relevant  protective  laws  and 
regulations  and  therefore  free  for  general  use. 

While  the  advice  and  information  in  this  book  are  believed  to  be  true  and  accurate  at  the  date  of  publication,  neither 
the  authors  nor  the  editors  nor  the  publisher  can  accept  any  legal  responsibility  for  any  errors  or  omissions  that  may 
be  made.  The  publisher  makes  no  warranty,  express  or  implied,  with  respect  to  the  material  contained  herein. 

Printed  on  acid-free  paper 

Springer  is  part  of  Springer  Science+Business  Media  (www.springer.com) 


Preface  to  the  Fourth  English  Edition 


For  the  present  edition,  the  book  has  undergone  two  major  changes:  Its 
appearance  was  tightened  significantly  and  the  programs  are  now  written  in 
the  modem  programming  language  Java. 

Tightening  was  possible  without  giving  up  essential  contents  by  expedi¬ 
ent  use  of  the  Internet.  Since  practically  all  users  can  connect  to  the  net,  it  is 
no  longer  necessary  to  reproduce  program  listings  in  the  printed  text.  In  this 
way,  the  physical  size  of  the  book  was  reduced  considerably. 

The  Java  language  offers  a  number  of  advantages  over  the  older  program¬ 
ming  languages  used  in  earlier  editions.  It  is  object-oriented  and  hence  also 
more  readable.  It  includes  access  to  libraries  of  user-friendly  auxiliary  rou¬ 
tines,  allowing  for  instance  the  easy  creation  of  windows  for  input,  output, 
or  graphics.  For  most  popular  computers,  Java  is  either  preinstalled  or  can  be 
downloaded  from  the  Internet  free  of  charge.  (See  Sect.  1.3  for  details.)  Since 
by  now  Java  is  often  taught  at  school,  many  students  are  already  somewhat 
familiar  with  the  language. 

Our  Java  programs  for  data  analysis  and  for  the  production  of  graphics, 
including  many  example  programs  and  solutions  to  programming  problems, 
can  be  downloaded  from  the  page  extras.springer.com. 

I  am  grateful  to  Dr.  Tilo  Stroh  for  numerous  stimulating  discussions  and 
technical  help.  The  graphics  programs  are  based  on  previous  common  work. 

Siegen,  Germany  Siegmund  Brandt 
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1.  Introduction 


1.1  Typical  Problems  of  Data  Analysis 

Every  branch  of  experimental  science,  after  passing  through  an  early  stage 
of  qualitative  description,  concerns  itself  with  quantitative  studies  of  the  phe¬ 
nomena  of  interest,  i.e.,  measurements.  In  addition  to  designing  and  carrying 
out  the  experiment,  an  important  task  is  the  accurate  evaluation  and  complete 
exploitation  of  the  data  obtained.  Let  us  list  a  few  typical  problems. 

1 .  A  study  is  made  of  the  weight  of  laboratory  animals  under  the  influence 
of  various  drugs.  After  the  application  of  drug  A  to  25  animals,  an 
average  increase  of  5  %  is  observed.  Drug  B ,  used  on  10  animals,  yields 
a  3  %  increase.  Is  drug  A  more  effective?  The  averages  5  and  3  %  give 
practically  no  answer  to  this  question,  since  the  lower  value  may  have 
been  caused  by  a  single  animal  that  lost  weight  for  some  unrelated 
reason.  One  must  therefore  study  the  distribution  of  individual  weights 
and  their  spread  around  the  average  value.  Moreover,  one  has  to  decide 
whether  the  number  of  test  animals  used  will  enable  one  to  differentiate 
with  a  certain  accuracy  between  the  effects  of  the  two  drugs. 

2.  In  experiments  on  crystal  growth  it  is  essential  to  maintain  exactly  the 
ratios  of  the  different  components.  From  a  total  of  500  crystals,  a  sam¬ 
ple  of  20  is  selected  and  analyzed.  What  conclusions  can  be  drawn 
about  the  composition  of  the  remaining  480?  This  problem  of  sampling 
comes  up,  for  example,  in  quality  control,  reliability  tests  of  automatic 
measuring  devices,  and  opinion  polls. 

3.  A  certain  experimental  result  has  been  obtained.  It  must  be  decided 
whether  it  is  in  contradiction  with  some  predicted  theoretical  value 
or  with  previous  experiments.  The  experiment  is  used  for  hypothesis 
testing. 
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4.  A  general  law  is  known  to  describe  the  dependence  of  measured 
variables,  but  parameters  of  this  law  must  be  obtained  from  experi¬ 
ment.  In  radioactive  decay,  for  example,  the  number  N  of  atoms  that 
decay  per  second  decreases  exponentially  with  time:  N(t)  =  const- 
exp(— Xt).  One  wishes  to  determine  the  decay  constant  A  and  its  mea¬ 
surement  error  by  making  maximal  use  of  a  series  of  measured  val¬ 
ues  Niiti),  ....  One  is  concerned  here  with  the  problem  of 

fitting  a  function  containing  unknown  parameters  to  the  data  and  the 
determination  of  the  numerical  values  of  the  parameters  and  their 
errors. 

From  these  examples  some  of  the  aspects  of  data  analysis  become  appar¬ 
ent.  We  see  in  particular  that  the  outcome  of  an  experiment  is  not  uniquely 
determined  by  the  experimental  procedure  but  is  also  subject  to  chance:  it  is  a 
random  variable.  This  stochastic  tendency  is  either  rooted  in  the  nature  of  the 
experiment  (test  animals  are  necessarily  different,  radioactivity  is  a  stochastic 
phenomenon),  or  it  is  a  consequence  of  the  inevitable  uncertainties  of  the  ex¬ 
perimental  equipment,  i.e.,  measurement  errors.  It  is  often  useful  to  simulate 
with  a  computer  the  variable  or  stochastic  characteristics  of  the  experiment  in 
order  to  get  an  idea  of  the  expected  uncertainties  of  the  results  before  carrying 
out  the  experiment  itself.  This  simulation  of  random  quantities  on  a  computer 
is  called  the  Monte  Carlo  method ,  so  named  in  reference  to  games  of  chance. 

1.2  On  the  Structure  of  this  Book 

The  basis  for  using  random  quantities  is  the  calculus  of  probabilities .  The 
most  important  concepts  and  rules  for  this  are  collected  in  Chap.  2.  Random 
variables  are  introduced  in  Chap.  3.  Here  one  considers  distributions  of  ran¬ 
dom  variables,  and  parameters  are  defined  to  characterize  the  distributions, 
such  as  the  expectation  value  and  variance.  Special  attention  is  given  to  the 
interdependence  of  several  random  variables.  In  addition,  transformations  be¬ 
tween  different  sets  of  variables  are  considered;  this  forms  the  basis  of  error 
propagation. 

Generating  random  numbers  on  a  computer  and  the  Monte  Carlo  method 
are  the  topics  of  Chap.  4.  In  addition  to  methods  for  generating  random 
numbers,  a  well-tested  program  and  also  examples  for  generating  arbitrarily 
distributed  random  numbers  are  given.  Use  of  the  Monte  Carlo  method  for 
problems  of  integration  and  simulation  is  introduced  by  means  of  examples. 
The  method  is  also  used  to  generate  simulated  data  with  measurement  errors, 
with  which  the  data  analysis  routines  of  later  chapters  can  be  demonstrated. 
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In  Chap.  5  we  introduce  a  number  of  distributions  which  are  of  particular 
interest  in  applications.  This  applies  especially  to  the  Gaussian  or  normal 
distribution,  whose  properties  are  studied  in  detail. 

In  practice  a  distribution  must  be  determined  from  a  finite  number  of 
observations,  i.e.,  from  a  sample.  Various  cases  of  sampling  are  considered  in 
Chap.  6.  Computer  programs  are  presented  for  a  first  rough  numerical  treat¬ 
ment  and  graphical  display  of  empirical  data.  Functions  of  the  sample,  i.e., 
of  the  individual  observations,  can  be  used  to  estimate  the  parameters  charac¬ 
terizing  the  distribution.  The  requirements  that  a  good  estimate  should  satisfy 
are  derived.  At  this  stage  the  quantity  x2  is  introduced.  This  is  the  sum  of 
the  squares  of  the  deviations  between  observed  and  expected  values  and  is 
therefore  a  suitable  indicator  of  the  goodness-of-fit. 

The  maximum-likelihood  method,  discussed  in  Chap.  7,  forms  the  core  of 
modern  statistical  analysis.  It  allows  one  to  construct  estimators  with  optimum 
properties.  The  method  is  discussed  for  the  single  and  multiparameter  cases 
and  illustrated  in  a  number  of  examples.  Chapter  8  is  devoted  to  hypothesis 
testing.  It  contains  the  most  commonly  used  F ,  t,  and  x2  tests  and  in  addition 
outlines  the  general  points  of  test  theory. 

The  method  of  least  squares,  which  is  perhaps  the  most  widely  used 
statistical  procedure,  is  the  subject  of  Chap.  9.  The  special  cases  of  direct, 
indirect,  and  constrained  measurements,  often  encountered  in  applications, 
are  developed  in  detail  before  the  general  case  is  discussed.  Programs  and 
examples  are  given  for  all  cases.  Every  least-squares  problem,  and  in  general 
every  problem  of  maximum  likelihood,  involves  determining  the  minimum  of 
a  function  of  several  variables.  In  Chap.  10  various  methods  are  discussed 
in  detail,  by  which  such  a  minimization  can  be  carried  out.  The  relative 
efficiency  of  the  procedures  is  shown  by  means  of  programs  and  examples. 

The  analysis  of  variance  (Chap.  11)  can  be  considered  as  an  extension 
of  the  F-test.  It  is  widely  used  in  biological  and  medical  research  to  study 
the  dependence,  or  rather  to  test  the  independence,  of  a  measured  quan¬ 
tity  from  various  experimental  conditions  expressed  by  other  variables.  For 
several  variables  rather  complex  situations  can  arise.  Some  simple  numerical 
examples  are  calculated  using  a  computer  program. 

Linear  and  polynomial  regression,  the  subject  of  Chap.  12,  is  a  special 
case  of  the  least- squares  method  and  has  therefore  already  been  treated  in 
Chap.  9.  Before  the  advent  of  computers,  usually  only  linear  least-squares 
problems  were  tractable.  A  special  terminology,  still  used,  was  developed  for 
this  case.  It  seemed  therefore  justified  to  devote  a  special  chapter  to  this  sub¬ 
ject.  At  the  same  time  it  extends  the  treatment  of  Chap.  9.  For  example  the 
determination  of  confidence  intervals  for  a  solution  and  the  relation  between 
regression  and  analysis  of  variance  are  studied.  A  general  program  for  poly¬ 
nomial  regression  is  given  and  its  use  is  shown  in  examples. 
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In  the  last  chapter  the  elements  of  time  series  analysis  are  introduced. 
This  method  is  used  if  data  are  given  as  a  function  of  a  controlled  variable 
(usually  time)  and  no  theoretical  prediction  for  the  behavior  of  the  data  as  a 
function  of  the  controlled  variable  is  known.  It  is  used  to  try  to  reduce  the  sta¬ 
tistical  fluctuation  of  the  data  without  destroying  the  genuine  dependence  on 
the  controlled  variable.  Since  the  computational  work  in  time  series  analysis 
is  rather  involved,  a  computer  program  is  also  given. 

The  field  of  data  analysis,  which  forms  the  main  part  of  this  book,  can 
be  called  applied  mathematical  statistics.  In  addition,  wide  use  is  made  of 
other  branches  of  mathematics  and  of  specialized  computer  techniques.  This 
material  is  contained  in  the  appendices. 

In  Appendix  A,  titled  “Matrix  Calculations”,  the  most  important 
concepts  and  methods  from  linear  algebra  are  summarized.  Of  central  impor¬ 
tance  are  procedures  for  solving  systems  of  linear  equations,  in  particular  the 
singular  value  decomposition,  which  provides  the  best  numerical  properties. 

Necessary  concepts  and  relations  of  combinatorics  are  compiled  in 
Appendix  B.  The  numerical  value  of  functions  of  mathematical  statistics  must 
often  be  computed.  The  necessary  formulas  and  algorithms  are  contained  in 
Appendix  C.  Many  of  these  functions  are  related  to  the  Euler  gamma  func¬ 
tion  and  like  it  can  only  be  computed  with  approximation  techniques.  In 
Appendix  D  formulas  and  methods  for  gamma  and  related  functions  are  given. 
Appendix  E  describes  further  methods  for  numerical  differentiation,  for  the 
determination  of  zeros,  and  for  interactive  input  and  output  under  Java. 

The  graphical  representation  of  measured  data  and  their  errors  and  in 
many  cases  also  of  a  fitted  function  is  of  special  importance  in  data  analysis. 
In  Appendix  F  a  Java  class  with  a  comprehensive  set  of  graphical  methods  is 
presented.  The  most  important  concepts  of  computer  graphics  are  introduced 
and  all  of  the  necessary  explanations  for  using  this  class  are  given. 

Appendix  G.l  contains  problems  to  most  chapters.  These  problems  can 
be  solved  with  paper  and  pencil.  They  should  help  the  reader  to  understand 
the  basic  concepts  and  theorems.  In  some  cases  also  simple  numerical  calcu¬ 
lations  must  be  carried  out.  In  Appendix  G.2  either  the  solution  of  problems 
is  sketched  or  the  result  is  simply  given.  In  Appendix  G.3  a  number  of  pro¬ 
gramming  problems  is  presented.  For  each  one  an  example  solution  is  given. 

The  set  of  appendices  is  concluded  with  a  collection  of  formulas  in 
Appendix  H,  which  should  facilitate  reference  to  the  most  important  equa¬ 
tions,  and  with  a  short  collection  of  statistical  tables  in  Appendix  I.  Although 
all  of  the  tabulated  values  can  be  computed  (and  in  fact  were  computed)  with 
the  programs  of  Appendix  C,  it  is  easier  to  look  up  one  or  two  values  from 
the  tables  than  to  use  a  computer. 
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1.3  About  the  Computer  Programs 

For  the  present  edition  all  programs  were  newly  written  in  the  programming 
language  Java.  Since  some  time  Java  is  taught  in  many  schools  so  that  young 
readers  often  are  already  familiar  with  that  language.  Java  classes  are  directly 
executable  on  all  popular  computers  -  independently  of  the  operating  sys¬ 
tem.  The  compilation  of  Java  source  programs  takes  place  using  the  Java  De¬ 
velopment  Kit,  which  for  many  operating  systems,  in  particular  Windows, 
Linux,  and  Mac  OSX,  can  be  downloaded  free  of  cost  from  the  Internet, 
http://www.oracle.com/technetwork/java/index.html 

There  are  four  groups  of  computer  programs  discussed  in  this  book. 
These  are 

•  The  data  analysis  library  in  the  form  of  the  package  datan, 

•  The  graphics  library  in  the  form  of  the  package  datangraphics, 

•  A  collection  of  example  programs  in  the  package  examples, 

•  Solutions  to  the  programming  problems  in  the  package  solutions. 

The  programs  of  all  groups  are  available  both  as  compiled  classes  and 
(except  for  datangraphics.  DatanGraphics)  also  as  source  files.  In 
addition  there  is  the  extensive  Java-typical  documentation  in  html  format. 

Every  class  and  method  of  the  package  datan  deals  with  a  particular, 
well  defined  problem,  which  is  extensively  described  in  the  text.  That  also 
holds  for  the  graphics  library,  which  allows  to  produce  practically  any  type  of 
line  graphics  in  two  dimensions.  For  many  purposes  it  suffices,  however,  to 
use  one  of  5  classes  each  yielding  a  complete  graphics. 

In  order  to  solve  a  specific  problem  the  user  has  to  write  a  short  class 
in  Java,  which  essentially  consists  of  calling  classes  from  the  data  analysis 
library,  and  which  in  certain  cases  organizes  the  input  of  the  user’s  data  and 
output  of  the  results.  The  example  programs  are  a  collection  of  such  classes. 
The  application  of  each  method  from  the  data  analysis  and  graphics  libraries 
is  demonstrated  in  at  least  one  example  program.  Such  example  programs  are 
described  in  a  special  section  near  the  end  of  most  chapters. 

Near  the  end  of  the  book  there  is  a  List  of  Computer  Programs  in  al¬ 
phabetic  order.  For  each  program  from  the  data  analysis  library  and  from  the 
graphics  library  page  numbers  are  given,  for  an  explanation  of  the  program 
itself,  and  for  one  or  several  example  programs  demonstrating  its  use. 

The  programming  problems  like  the  example  programs  are  designed  to 
help  the  reader  in  using  computer  methods.  Working  through  these  problems 
should  enable  readers  to  formulate  their  own  specific  tasks  in  data  analysis 
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to  be  solved  on  a  computer.  For  all  programming  problems,  programs  exist 
which  represent  a  possible  solution. 

In  data  analysis,  of  course,  data  play  a  special  role.  The  type  of  data  and 
the  format  in  which  they  are  presented  to  the  computer  cannot  be  defined  in 
a  general  textbook  since  it  depends  very  much  on  the  particular  problem  at 
hand.  In  order  to  have  somewhat  realistic  data  for  our  examples  and  problems 
we  have  decided  to  produce  them  in  most  cases  within  the  program  using 
the  Monte  Carlo  method.  It  is  particularly  instructive  to  simulate  data  with 
known  properties  and  a  given  error  distribution  and  to  subsequently  analyze 
these  data.  In  the  analysis  one  must  in  general  make  an  assumption  about  the 
distribution  of  the  errors.  If  this  assumption  is  not  correct,  then  the  results 
of  the  analysis  are  not  optimal.  Effects  that  are  often  decisively  important 
in  practice  can  be  “experienced”  with  exercises  combining  simulation  and 
analysis. 

Here  are  some  short  hints  concerning  the  installation  of  our  pro¬ 
grams.  As  material  accompanying  this  book,  available  from  the  page 
extras.springer.com,  there  is  a  zip  file  named  DatanJ.  Down¬ 
load  this  file,  unzip  it  while  keeping  the  internal  tree  structure  of  subdirecto¬ 
ries  and  store  it  on  your  computer  in  a  new  directory.  (It  is  convenient  to  also 
give  that  directory  the  name  DatanJ.)  Further  action  is  described  in  the  file 
ReadME  in  that  directory. 


2.  Probabilities 


2.1  Experiments,  Events,  Sample  Space 

Since  in  this  book  we  are  concerned  with  the  analysis  of  data  originating  from 
experiments,  we  will  have  to  state  first  what  we  mean  by  an  experiment  and 
its  result.  Just  as  in  the  laboratory,  we  define  an  experiment  to  be  a  strictly 
followed  procedure,  as  a  consequence  of  which  a  quantity  or  a  set  of  quan¬ 
tities  is  obtained  that  constitutes  the  result.  These  quantities  are  continuous 
(temperature,  length,  current)  or  discrete  (number  of  particles,  birthday  of  a 
person,  one  of  three  possible  colors).  No  matter  how  accurately  all  conditions 
of  the  procedure  are  maintained,  the  results  of  repetitions  of  an  experiment 
will  in  general  differ.  This  is  caused  either  by  the  intrinsic  statistical  nature  of 
the  phenomenon  under  investigation  or  by  the  finite  accuracy  of  the  measure¬ 
ment.  The  possible  results  will  therefore  always  be  spread  over  a  finite  region 
for  each  quantity.  All  of  these  regions  for  all  quantities  that  make  up  the  result 
of  an  experiment  constitute  the  sample  space  of  that  experiment.  Since  it  is 
difficult  and  often  impossible  to  determine  exactly  the  accessible  regions  for 
the  quantities  measured  in  a  particular  experiment,  the  sample  space  actually 
used  may  be  larger  and  may  contain  the  true  sample  space  as  a  subspace.  We 
shall  use  this  somewhat  looser  concept  of  a  sample  space. 

Example  2.1:  Sample  space  for  continuous  variables 

In  the  manufacture  of  resistors  it  is  important  to  maintain  the  values  R  (electri¬ 
cal  resistance  measured  in  ohms)  and  N  (maximum  heat  dissipation  measured 
in  watts)  at  given  values.  The  sample  space  for  R  and  A  is  a  plane  spanned 
by  axes  labeled  R  and  N.  Since  both  quantities  are  always  positive,  the  first 
quadrant  of  this  plane  is  itself  a  sample  space.  ■ 
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Example  2.2:  Sample  space  for  discrete  variables 

In  practice  the  exact  values  of  R  and  N  are  unimportant  as  long  as  they  are 
contained  within  a  certain  interval  about  the  nominal  value  (e.g.,  99  ki2  < 
R  <  101  k^2,  0.49  W  <  ./V  <  0.60  W).  If  this  is  the  case,  we  shall  say  that  the 
resistor  has  the  properties  Rn ,  Nn.  If  the  value  falls  below  (above)  the  lower 
(upper)  limit,  then  we  shall  substitute  the  index  n  by  — (+).  The  possible  val¬ 
ues  of  resistance  and  heat  dissipation  are  therefore  R-,  Rn,  R+,  N-,  Nn,  N+. 
The  sample  space  now  consists  of  nine  points: 


R-N-, 

R-Nn, 

R-N+, 

RnN _, 

Rn  Nn, 

Rn  N+, 

R+N-, 

R+N+.  m 

Often  one  or  more  particular  subspaces  of  the  sample  space  are  of  spe¬ 
cial  interest.  In  Example  2.2,  for  instance,  the  point  Rn,  Nn  represents  the 
case  where  the  resistors  meet  the  production  specifications.  We  can  give  such 
subspaces  names,  e.g.,  A,  B, ...  and  say  that  if  the  result  of  an  experiment 
falls  into  one  such  subspace,  then  the  event  A  (or  B,C, .. .)  has  occurred.  If  A 
has  not  occurred,  we  speak  of  the  complementary  event  A  (i.e.,  not  A).  The 
whole  sample  space  corresponds  to  an  event  that  will  occur  in  every  exper¬ 
iment,  which  we  call  E.  In  the  rest  of  this  chapter  we  shall  define  what  we 
mean  by  the  probability  of  the  occurrence  of  an  event  and  present  rules  for 
computations  with  probabilities. 

2.2  The  Concept  of  Probability 

Let  us  consider  the  simplest  experiment,  namely,  the  tossing  of  a  coin.  Like 
the  throwing  of  dice  or  certain  problems  with  playing  cards  it  is  of  no  practical 
interest  but  is  useful  for  didactic  purposes.  What  is  the  probability  that  a  “fair” 
coin  shows  “heads”  when  tossed  once?  Our  intuition  suggests  that  this  prob¬ 
ability  is  equal  to  1/2.  It  is  based  on  the  assumption  that  all  points  in  sample 
space  (there  are  only  two  points:  “heads”  and  “tails”)  are  equally  probable  and 
on  the  convention  that  we  give  the  event  E  (here:  “heads”  or  “tails”)  a  prob¬ 
ability  of  unity.  This  way  of  determining  probabilities  can  be  applied  only  to 
symmetric  experiments  and  is  therefore  of  little  practical  use.  (It  is,  however, 
of  great  importance  in  statistical  physics  and  quantum  statistics,  where  the 
equal  probabilities  of  all  allowed  states  is  an  essential  postulate  of  very  suc¬ 
cessful  theories.)  If  no  such  perfect  symmetry  exists — which  will  even  be  the 
case  with  normal  “physical”  coins — the  following  procedure  seems  reason- 
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able.  In  a  large  number  N  of  experiments  the  event  A  is  observed  to  occur  n 
times.  We  define 

P(A)  =  lim  —  (2.2.1) 

N^oo  N 

as  the  probability  of  the  occurrence  of  the  event  A.  This  somewhat  loose  fre¬ 
quency  definition  of  probability  is  sufficient  for  practical  purposes,  although 
it  is  mathematically  unsatisfactory.  One  of  the  difficulties  with  this  definition 
is  the  need  for  an  infinity  of  experiments,  which  are  of  course  impossible 
to  perform  and  even  difficult  to  imagine.  Although  we  shall  in  fact  use  the 
frequency  definition  in  this  book,  we  will  indicate  the  basic  concepts  of  an 
axiomatic  theory  of  probability  due  to  Kolmogorov  [1],  The  minimal  set 
of  axioms  generally  used  is  the  following: 

(a)  To  each  event  A  there  corresponds  a  non-negative  number,  its  proba¬ 
bility, 

P(A)  >  0  .  (2.2.2) 

(b)  The  event  E  has  unit  probability, 

P(E)  =  1  .  (2.2.3) 

(c)  If  A  and  B  are  mutually  exclusive  events,  then  the  probability  of  A  or 
B  (written  A  +  B)  is 

P(A  +  B)  =  P(A)  +  P(B)  .  (2.2.4) 

From  these  axioms*  one  obtains  immediately  the  following  useful  results. 
From  (b)  and  (c): 

P(A  +  A)  =  P(A)  +  P(A)  =  1  ,  (2.2.5) 

and  furthermore  with  (a): 

0  <  P(A)  <  1  .  (2.2.6) 

From  (c)  one  can  easily  obtain  the  more  general  theorem  for  mutually  exclu¬ 
sive  events  A,  B,  C , . . . , 

P(A  +  B  +  C  +  ---)  =  P(A)  +  P(B)  +  P(C)  +  --  .  (2.2.7) 

It  should  be  noted  that  summing  the  probabilities  of  events  combined  with 
“or”  here  refers  only  to  mutually  exclusive  events.  If  one  must  deal  with  events 
that  are  not  of  this  type,  then  they  must  first  be  decomposed  into  mutually 
exclusive  ones.  In  throwing  a  die,  A  may  signify  even,  B  odd,  C  less  than 
4  dots,  D  4  or  more  dots.  Suppose  one  is  interested  in  the  probability  for  the 


*Sometimes  the  definition  (2.3.1)  is  included  as  a  fourth  axiom. 
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event  A  or  C,  which  are  obviously  not  exclusive.  One  forms  A  and  C  (written 
AC)  as  well  as  AD,  BC,  and  BD,  which  are  mutually  exclusive,  and  finds  for 
A  or  C  (sometimes  written  A  +  C)  the  expression  AC  +  AD  +  BC.  Note  that 
the  axioms  do  not  prescribe  a  method  for  assigning  the  value  of  a  particular 
probability  P(A). 

Finally  it  should  be  pointed  out  that  the  word  probability  is  often  used  in 
common  language  in  a  sense  that  is  different  or  even  opposed  to  that  consid¬ 
ered  by  us.  This  is  subjective  probability,  where  the  probability  of  an  event  is 
given  by  the  measure  of  our  belief  in  its  occurrence.  An  example  of  this  is: 
“The  probability  that  the  party  A  will  win  the  next  election  is  1/3.”  As  another 
example  consider  the  case  of  a  certain  track  in  nuclear  emulsion  which  could 
have  been  left  by  a  proton  or  pion.  One  often  says:  “The  track  was  caused  by 
a  pion  with  probability  1  /2.”  But  since  the  event  had  already  taken  place  and 
only  one  of  the  two  kinds  of  particle  could  have  caused  that  particular  track, 
the  probability  in  question  is  either  0  or  1 ,  but  we  do  not  know  which. 


2.3  Rules  of  Probability  Calculus:  Conditional  Probability 


Suppose  the  result  of  an  experiment  has  the  property  A.  We  now  ask  for  the 
probability  that  it  also  has  the  property  B,  i.e.,  the  probability  of  B  under  the 
condition  A.  We  define  this  conditional  probability  as 


P(B\A)  = 


P(AB) 

P(A) 


(2.3.1) 


It  follows  that 

P(A  B)  —  P(A)  P{B\A)  .  (2.3.2) 

One  can  also  use  (2.3.2)  directly  for  the  definition,  since  here  the  requirement 
P(A)  0  is  not  necessary.  From  Fig.  2.1  it  can  be  seen  that  this  definition  is 
reasonable.  Consider  the  event  A  to  occur  if  a  point  is  in  the  region  labeled 
A,  and  correspondingly  for  the  event  (and  region)  B.  For  the  overlap  region 
both  A  and  B  occur,  i.e.,  the  event  (AB)  occurs.  Let  the  area  of  the  different 
regions  be  proportional  to  the  probabilities  of  the  corresponding  events.  Then 
the  probability  of  B  under  the  condition  A  is  the  ratio  of  the  area  AB  to  that 
of  A.  In  particular  this  is  equal  to  unity  if  A  is  contained  in  B  and  zero  if  the 
overlapping  area  vanishes. 

Using  conditional  probability  we  can  now  formulate  the  rule  of  total 
probability.  Consider  an  experiment  that  can  lead  to  one  of  n  possible  mu¬ 
tually  exclusive  events, 


£’  =  Ai  +  A2  H - \-  An 


(2.3.3) 
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The  probability  for  the  occurrence  of  any  event  with  the  property  B  is 

n 

P(B)  =  J2p(Ai)p(B \Ai)  >  (2-3.4) 

i= 1 

as  can  be  seen  easily  from  (2.3.2)  and  (2.2.7). 


Fig.  2.1  :  Illustration  of  conditional  probability. 


We  can  now  also  define  the  independence  of  events.  Two  events  A  and 
B  are  said  to  be  independent  if  the  knowledge  that  A  has  occurred  does  not 
change  the  probability  for  B  and  vice  versa,  i.e.,  if 

P(B\A)  =  P(B)  ,  (2.3.5) 


or,  by  use  of  (2.3.2), 

P(AB)  =  P(A)P(B)  .  (2.3.6) 

In  general  several  decompositions  of  the  type  (2.3.3), 

E  =  Ai+A2H - b  An  , 

E  =  B\  +  i?2  +  •  •  •  +  Bm  ,  (2.3.7) 

E  —  Zi  +  Z2H - h  Zi  , 


are  said  to  be  independent,  if  for  all  possible  combinations  a, a>  the 
condition 

P(AaBp  =  P(Aa)P(Bp)  •  •  •  P(ZJ  (2.3.8) 

is  fulfilled. 


2.4  Examples 

2.4.1  Probability  for  n  Dots  in  the  Throwing  of  Two  Dice 

If  n i  and  n2  are  the  number  of  dots  on  the  individual  dice  and  if  n  =  n\  +  n2, 
then  one  has  P(n()  =  1/6;  i  =  1,2;  =  1,2,  ...,6.  Because  the  two  dice 

are  independent  of  each  other  one  has  P(ni,n2)  =  P(n\  ) Pino)  =  1/36.  By 
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considering  in  how  many  different  ways  the  sum  n  =  n,  +  tij  can  be  formed 
one  obtains 


ft  (2)  = 

*0 

h— ^ 

V* 

h— * 

II 

U) 

G\ 

# 

ft(3)  = 

P(\,  2)  +  P(2, 1)  =  2/36  , 

ft  (4)  = 

P(l,3)  +  P(2,2)  +  P(3,l)  =  3/36  , 

ft  (5)  = 

P(  1, 4)  +  P( 2,  3)  +  P( 3,  2)  +  P( 4, 1)  =  4/36  , 

ft  (6)  = 

PC  1,5)  +  P(. 2, 4)  +  PC 3,  3)  +  PC 4,  2) 

+  P(  5,1)  =  5/36  , 

ft  (7)  = 

P(l,6)  +  P(2,5)  +  P(3,4)  +  P(4,3) 

+  P(5,  2)  +  P(6, 1)  =  6/36  , 

ft(8)  = 

ft  (6)  =  5/36  , 

ft  (9)  = 

ft  (5)  =  4/36  , 

ft(10)  = 

ft  (4)  =  3/36  , 

ft(ll)  = 

ft  (3)  =  2/36  , 

ft(12)  = 

ft  (2)  =  1/36  . 

Of  course,  the  normalization  condition 


J2li2  ft  (k)  =  1  is  fulfilled. 


2.4.2  Lottery  6  Out  of  49 


A  container  holds  49  balls  numbered  1  through  49.  During  the  drawing  6 
balls  are  taken  out  of  the  container  consecutively  and  none  are  put  back  in. 
We  compute  the  probabilities  P(l),  P( 2), . . .,  P(6)  that  a  player,  who  before 
the  drawing  has  chosen  six  of  the  numbers  1,2, . . .,  49,  has  predicted  exactly 
1,  2, . . .,  or  6  of  the  drawn  numbers. 

First  we  compute  P( 6).  The  probability  to  choose  as  the  first  number 
the  one  which  will  also  be  drawn  first  is  obviously  1/49.  If  that  step  was 
successful,  then  the  probability  to  choose  as  the  second  number  the  one  which 
is  also  drawn  second  is  1/48.  We  conclude  that  the  probability  for  choosing 
six  numbers  correctly  in  the  order  in  which  they  are  drawn  is 


1  _  43! 

49  •  48  •  47  •  46  •  45  • 44  _  49! 

The  order,  however,  is  irrelevant.  Since  there  are  6!  possible  ways  to  arrange 
six  numbers  in  different  orders  we  have 


P(  6)  = 


2.4  Examples 
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That  is  exactly  the  inverse  of  the  number  of  combinations  Cg9  of  6  elements 
out  of  49  (see  Appendix  B),  since  all  of  these  combinations  are  equally  prob¬ 
able  but  only  one  of  them  contains  only  the  drawn  numbers. 

We  may  now  argue  that  the  container  holds  two  kinds  of  balls,  namely  6 
balls  in  which  the  player  is  interested  since  they  carry  the  numbers  which  he 
selected,  and  43  balls  whose  numbers  the  player  did  not  select.  The  result  of 
the  drawing  is  a  sample  from  a  set  of  49  elements  of  which  6  are  of  one  kind 
and  43  are  of  the  other.  The  sample  itself  contains  6  elements  which  are  drawn 
without  putting  elements  back  into  the  container.  This  method  of  sampling  is 
described  by  the  hypergeometric  distribution  (see  Sect.  5.3).  The  probability 
for  predicting  correctly  l  out  of  the  6  drawn  numbers  is 


P(0  = 


i  =  0, . . . ,  6 


2.4.3  Three-Door  Game 

In  a  TV  game  show  a  candidate  is  given  the  following  problem.  Three  rooms 
are  closed  by  three  identical  doors.  One  room  contains  a  luxury  car,  the  other 
two  each  contain  a  goat.  The  candidate  is  asked  to  guess  behind  which  of 
the  doors  the  car  is.  He  chooses  a  door  which  we  will  call  A.  The  door  A, 
however,  remains  closed  for  the  moment.  Of  course,  behind  at  least  one  of  the 
other  doors  there  is  a  goat.  The  quiz  master  now  opens  one  door  which  we 
will  call  B  to  reveal  a  goat.  He  now  gives  the  candidate  the  chance  to  either 
stay  with  the  original  choice  A  or  to  choose  remaining  closed  door  C.  Can  the 
candidate  increase  his  or  her  chances  by  choosing  C  instead  of  A? 

The  answer  (astonishing  for  many)  is  yes.  The  probability  to  find  the  car 
behind  the  door  A  obviously  is  P(A)  =  1/3.  Then  the  probability  that  the  car 
is  behind  one  of  the  other  doors  is  P(A)  =  2/3.  The  candidate  exhausts  this 
probability  fully  if  he  chooses  the  door  C  since  through  the  opening  of  B  it  is 
shown  to  be  a  door  without  the  car,  so  that  P(C )  =  P(  A  ). 


3.  Random  Variables:  Distributions 


3.1  Random  Variables 

We  will  now  consider  not  the  probability  of  observing  particular  events  but 
rather  the  events  themselves  and  try  to  find  a  particularly  simple  way  of  clas¬ 
sifying  them.  We  can,  for  instance,  associate  the  event  “heads”  with  the  num¬ 
ber  0  and  the  event  “tails”  with  the  number  1.  Generally  we  can  classify  the 
events  of  the  decomposition  (2.3.3)  by  associating  each  event  A,-  with  the  real 
number  i .  In  this  way  each  event  can  be  characterized  by  one  of  the  possible 
values  of  a  random  variable.  Random  variables  can  be  discrete  or  continuous. 
We  denote  them  by  symbols  like  x,  y, .... 

Example  3.1:  Discrete  random  variable 

It  may  be  of  interest  to  study  the  number  of  coins  still  in  circulation  as  a 
function  of  their  age.  It  is  obviously  most  convenient  to  use  the  year  of  issue 
stamped  on  each  coin  directly  as  the  (discrete)  random  variable,  e.g.,  x  =  . . ., 
1949,  1950,  1951,  ....■ 

Example  3.2:  Continuous  random  variable 

All  processes  of  measurement  or  production  are  subject  to  smaller  or  larger 
imperfections  or  fluctuations  that  lead  to  variations  in  the  result,  which  is 
therefore  described  by  one  or  several  random  variables.  Thus  the  values  of 
electrical  resistance  and  maximum  heat  dissipation  characterizing  a  resistor 
in  Example  2. 1  are  continuous  random  variables.  ■ 

3.2  Distributions  of  a  Single  Random  Variable 

From  the  classification  of  events  we  return  to  probability  considerations.  We 
consider  the  random  variable  x  and  a  real  number  x,  which  can  assume  any 
value  between  —  oo  and  +00,  and  study  the  probability  for  the  event  x  <  x. 

S.  Brandt,  Data  Analysis:  Statistical  and  Computational  Methods  for  Scientists  and  Engineers,  15 
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This  probability  is  a  function  of  x  and  is  called  the  (cumulative)  distribution 
function  of  X: 

F(x)  —  P(x  <  x)  .  (3.2.1) 

If  x  can  assume  only  a  finite  number  of  discrete  values,  e.g.,  the  number  of 
dots  on  the  faces  of  a  die,  then  the  distribution  function  is  a  step  function.  It 
is  shown  in  Fig.  3.1  for  the  example  mentioned  above.  Obviously  distribution 
functions  are  always  monotonic  and  non-decreasing. 


Fig.  3.1  :  Distribution  function  for  throwing 
of  a  symmetric  die. 


Because  of  (2.2.3)  one  has  the  limiting  case 


lim  F(x)  =  lim  P(x  <  x)  =  P(E )  -  1 


(3.2.2) 


Applying  Eqs.  (2.2.5)-(3.2.1)  we  obtain 


P(x  >  x)  =  1  -  F(x)  =  \  -  P(x<  x) 


(3.2.3) 


and  therefore 


lim  F(x)  —  lim  P(X  <  x)  =  1  —  lim  P(X  >  x)  —  0  .  (3.2.4) 

X — >  — OO  X — >  — OO  X — >  — OO 


Of  special  interest  are  distribution  functions  F(x)  that  are  continuous  and 
differentiable.  The  first  derivative 


(3.2.5) 


is  called  the  probability  density  (function )  of  x.  It  is  a  measure  of  the  proba¬ 
bility  of  the  event  (x  <  x  <  x  +  dx).  From  (3.2.1)  and  (3.2.5)  it  immediately 
follows  that 

/a 

fix)  dx  ,  (3.2.6) 

-OO 

P(a  <X  <  b)  —  f  f  (x)dx  =  F(b)  —  F(a)  , 

J  a 


(3.2.7) 
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and  in  particular 

/oo 

f(x)dx  =  1  .  (3.2.8) 

-oo 

A  trivial  example  of  a  continuous  distribution  is  given  by  the  angular 
position  of  the  hand  of  a  watch  read  at  random  intervals.  We  obtain  a  constant 
probability  density  (Fig.  3.2). 


Fig.  3.2  :  Distribution  function  and  probabil¬ 
ity  density  for  the  angular  position  of  a  watch 
hand. 


360 
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360 


3.3  Functions  of  a  Single  Random  Variable, 

Expectation  Value,  Variance,  Moments 

In  addition  to  the  distribution  of  a  random  variable  x,  we  are  often  interested 
in  the  distribution  of  a  function  of  x.  Such  a  function  of  a  random  variable  is 
also  a  random  variable: 

y  =  H(x)  .  (3.3.1) 

The  variable  y  then  possesses  a  distribution  function  and  probability  density 
in  the  same  way  as  x. 

In  the  two  simple  examples  of  the  last  section  we  were  able  to  give  the  dis¬ 
tribution  function  immediately  because  of  the  symmetric  nature  of  the  prob¬ 
lems.  Usually  this  is  not  possible.  Instead,  we  have  to  obtain  it  from  exper¬ 
iment.  Often  we  are  limited  to  determining  a  few  characteristic  parameters 
instead  of  the  complete  distribution. 

The  mean  or  expectation  value  of  a  random  variable  is  the  sum  of  all 
possible  values  x,  of  x  multiplied  by  their  corresponding  probabilities 

n 

E(x)  =  x  =  y^XjP(x  =  Xj) 


(3.3.2) 
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Note  that  x  is  not  a  random  variable  but  rather  has  a  fixed  value.  Correspond¬ 
ingly  the  expectation  value  of  a  function  (3.3.1)  is  defined  to  be 

n 

E{H(x)}  =  ^2H(xj)P(x  =  Xi)  .  (3.3.3) 

i— 1 


In  the  case  of  a  continuous  random  variable  (with  a  differentiable  distribution 
function),  we  define  by  analogy 


poo 

E{x)—xi—  1  xf(x)  dx 

J  —CO 

(3.3.4) 

and 

p  oo 

£{tf(x)}=  /  H(x)f(x)  dx  . 

J —oo 

(3.3.5) 

If  we  choose  in  particular 

H(x)  =  (x-cf  , 

(3.3.6) 

we  obtain  the  expectation  values 

at  =  E{(x-c)1}  , 

(3.3.7) 

which  are  called  the  l— th  moments  of  the  variable  about  the  point  c.  Of  special 
interest  are  the  moments  about  the  mean, 

IU  =  E{(x-x)1}  .  (3.3.8) 

The  lowest  moments  are  obviously 

^o  —  l  ,  /iii  =  0  .  (3.3.9) 

The  quantity 

/i  2  =  cr2(x)  =  var(x)  =  £{(x-x)2}  (3.3.10) 

is  the  lowest  moment  containing  information  about  the  average  deviation  of 
the  variable  x  from  its  mean.  It  is  called  the  variance  of  x. 

We  will  now  try  to  visualize  the  practical  meaning  of  the  expectation 
value  and  variance  of  a  random  variable  x.  Let  us  consider  the  measure¬ 
ment  of  some  quantity,  for  example,  the  length  xq  of  a  small  crystal  using 
a  microscope.  Because  of  the  influence  of  different  factors,  such  as  the  im¬ 
perfections  of  the  different  components  of  the  microscope  and  observational 
errors,  repetitions  of  the  measurement  will  yield  slightly  different  results  for 
x.  The  individual  measurements  will,  however,  tend  to  group  themselves  in 
the  neighborhood  of  the  true  value  of  the  length  to  be  measured,  i.e.,  it  will 
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be  more  probable  to  find  a  value  of  x  near  to  xo  than  far  from  it,  providing 
no  systematic  biases  exist.  The  probability  density  of  x  will  therefore  have  a 
bell-shaped  form  as  sketched  in  Fig.  3.3,  although  it  need  not  be  symmetric.  It 
seems  reasonable  -  especially  in  the  case  of  a  symmetric  probability  density  - 
to  interpret  the  expectation  value  (3.3.4)  as  the  best  estimate  of  the  true  value. 
It  is  interesting  to  note  that  (3.3.4)  has  the  mathematical  form  of  a  center  of 
gravity,  i.e.,  x  can  be  visualized  as  the  x-coordinate  of  the  center  of  gravity  of 
the  surface  under  the  curve  describing  the  probability  density. 

The  variance  (3.3.10), 

/OO 

(x  —  x)2/(x)dx  ,  (3.3.11) 

-00 


Fig. 3.3  :  Distribution  with  small  variance 
(a)  and  large  variance  (b). 


which  has  the  form  of  a  moment  of  inertia,  is  a  measure  of  the  width  or  dis¬ 
persion  of  the  probability  density  about  the  mean.  If  it  is  small,  the  individual 
measurements  lie  close  to  x  (Fig.  3.3a);  if  it  is  large,  they  will  in  general  be 
further  from  the  mean  (Fig.  3.3b).  The  positive  square  root  of  the  variance 

a  =  Vct2(x)  (3.3.12) 


is  called  the  standard  deviation  (or  sometimes  the  dispersion )  of  x.  Like  the 
variance  itself  it  is  a  measure  of  the  average  deviation  of  the  measurements  x 
from  the  expectation  value. 

Since  the  standard  deviation  has  the  same  dimension  as  x  (in  our  exam¬ 
ple  both  have  the  dimension  of  length),  it  is  identified  with  the  error  of  the 
measurement, 


20 


3  Random  Variables:  Distributions 


cr  (X)  —  Ax 

This  definition  of  measurement  error  is  discussed  in  more  detail  in  Sects.  5.6  - 
5.10.  It  should  be  noted  that  the  definitions  (3.3.4)  and  (3.3.10)  do  not  provide 
completely  a  way  of  calculating  the  mean  or  the  measurement  error,  since  the 
probability  density  describing  a  measurement  is  in  general  unknown. 

The  third  moment  about  the  mean  is  sometimes  called  skewness.  We  pre¬ 
fer  to  define  the  dimensionless  quantity 

Y  =  /x3/cr3  (3.3.13) 

to  be  the  skewness  of  x.  It  is  positive  (negative)  if  the  distribution  is  skew 
to  the  right  (left)  of  the  mean.  For  symmetric  distributions  the  skewness  van¬ 
ishes.  It  contains  information  about  a  possible  difference  between  positive  and 
negative  deviation  from  the  mean. 

We  will  now  obtain  a  few  important  rules  about  means  and  variances.  In 
the  case  where 

H(x)  —  cx  ,  c  —  const  ,  (3.3.14) 

it  follows  immediately  that 

E  (cx)  —  cE(x )  , 

cr2(cx)  =  cV(x)  ,  (3.3.15) 


and  therefore 

cr2(x )  =  E{(X-x)2}  =  E{x?  -Ixx  +  x2}  =  E(X2) -x2 


We  now  consider  the  function 


x-x 

ff(X) 


(3.3.16) 


(3.3.17) 


It  has  the  expectation  value 


E(  u)  = 


1  ^  1  _  _ 

E(x  —  x)  — - (x  —  x)  —  0 


(J(X) 


(J(X) 


(3.3.18) 


and  variance 


?  1  cr2(x) 

<y2(U)  =  -TT-E{(x-x)2}  =  -^-  =  \  .  (3.3.19) 

<72(X)  <?Z(X) 

The  function  u  -  which  is  also  a  random  variable  -  has  particularly  simple 
properties,  which  makes  its  use  in  more  involved  calculations  preferable.  We 
will  call  such  a  variable  (having  zero  mean  and  unit  variance)  a  reduced  vari¬ 
able.  It  is  also  called  a  standardized,  normalized,  or  dimensionless  variable. 
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Although  a  distribution  is  mathematically  most  easily  described  by  its  ex¬ 
pectation  value,  variance,  and  higher  moments  (in  fact  any  distribution  can  be 
completely  specified  by  these  quantities,  cf.  Sect.  5.5),  it  is  often  convenient  to 
introduce  further  definitions  so  as  to  better  visualize  the  form  of  a  distribution. 

The  mode  xm  (or  most  probable  value)  of  a  distribution  is  defined  as  that 
value  of  the  random  variable  that  corresponds  to  the  highest  probability: 

P  (x  =  vm)  =  max  .  (3.3.20) 

If  the  distribution  has  a  differentiable  probability  density,  the  mode,  which 
corresponds  to  its  maximum,  is  easily  determined  by  the  conditions 

T~f  CO  =  0  ,  -j— ,/ (x)  <  0  •  (3.3.21) 

d.\  axz 

In  many  cases  only  one  maximum  exists;  the  distribution  is  said  to  be  uni- 
modal.  The  median  .v'0.5  of  a  distribution  is  defined  as  that  value  of  the  random 
variable  for  which  the  distribution  function  equals  1/2: 

F(xo,5)  =  P(x  <  xo.5)  =  0-5  •  (3.3.22) 

In  the  case  of  a  continuous  probability  density  Eq.  (3.3.22)  takes  the  form 

/-V1.5 

f(x)dx  =  0.5  ,  (3.3.23) 

-OO 

i.e.,  the  median  divides  the  total  range  of  the  random  variable  into  two  regions 
each  containing  equal  probability. 

It  is  clear  from  these  definitions  that  in  the  case  of  a  unimodal  distribution 
with  continuous  probability  density  that  is  symmetric  about  its  maximum,  the 
values  of  mean,  mode,  and  median  coincide.  This  is  not,  however,  the  case  for 
asymmetric  distributions  (Fig.  3.4). 

f(x) 


Fig. 3.4  :  Most  probable  value 
(mode)  xm,  mean  x,  and  me¬ 
dian  X0.5  of  an  asymmetric 
distribution. 

The  definition  (3.3.22)  can  easily  be  generalized.  The  quantities  xo.25  and 
xo.75  defined  by 


Fix  0.25)  =  0.25 


F(xqjs)  =  0.75 


(3.3.24) 
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are  called  lower  and  upper  quartiles.  Similarly  we  can  define  deciles  xo.i,  xo.2, 
. . X0.9,  or  in  general  quantiles  xq,  by 

F(xq)=  f  f(x)dx—q  (3.3.25) 

J  —  OO 


with  0  <  q  <  1 . 

The  definition  of  quantiles  is  most  easily  visualized  from  Fig.  3.5.  In  a 
plot  of  the  distribution  function  F(x),  the  quantile  xq  can  be  read  off  as  the 
abscissa  corresponding  to  the  value  q  on  the  ordinate.  The  quantile  xq(q), 
regarded  as  a  function  of  the  probability  q ,  is  simply  the  inverse  of  the  distri¬ 
bution  function. 


F(x) 


Fig.  3.5  :  Median  and  quantiles 
of  a  continuous  distribution. 


Example  3.3:  Uniform  distribution 

We  will  now  discuss  the  simplest  case  of  a  distribution  function  of  a  continu¬ 
ous  variable.  Suppose  that  in  the  interval  a  <x  <  b  the  probability  density  of 
x  is  constant,  and  it  is  zero  outside  of  this  interval: 


f(x)  =  c  ,  a  <  x  <  b  , 

f(x)  —  0  ,  x  <  a  ,  x  >b 


Because  of  (3.2.8)  one  has 


/OO 

-CO 


f(x)dx 


dx  =  c(b  —  a)  =  1 


or 


/«  =  bhi 
/«  =  o  , 


a  <  x  <  b 


(3.3.26) 


x  <  a 


x  >  b 


(3.3.27) 


3.3  Functions  of  a  Single  Random  Variable 


23 


The  distribution  function  is 

F(x)  =  f  =  •  «<*<*>  . 

a  (3.3.28) 

F(x )  =  0  ,  x  <  a  , 

F(x)  —  1  ,  x  >b  . 

By  symmetry  arguments  the  expectation  value  of  x  must  be  the  arithmetic 
mean  of  the  boundaries  a  and  b.  In  fact,  (3.3.4)  immediately  gives 

£(x)=x  = -  /  xdx  = - (b2  —  a2)  = -  .  (3.3.29) 

b-aja  2  (b-a)  2 

Correspondingly,  one  obtains  from  (3.3.10) 

cr2 (x)  = -^(b  -  a)2  .  (3.3.30) 


The  uniform  distribution  is  not  of  great  practical  interest.  It  is,  how¬ 
ever,  particularly  easy  to  handle,  being  the  simplest  distribution  of  a  contin¬ 
uous  variable.  It  is  often  advantageous  to  transform  a  distribution  function 
by  means  of  a  transformation  of  variables  into  a  uniform  distribution  or  the 
reverse,  to  express  the  given  distribution  in  terms  of  a  uniform  distribution. 
This  method  is  used  particularly  in  the  “Monte  Carlo  method”  discussed  in 
Chap.  4.  ■ 


Example  3.4:  Cauchy  distribution 

In  the  (x,  y)  plane  a  gun  is  mounted  at  the  point  (x,  y)  =  (0,-1)  such  that  its 
barrel  lies  in  the  (x,  y)  plane  and  can  rotate  around  an  axis  parallel  to  the  z 
axis  (Fig.  3.6). 

The  gun  is  fired  such  that  the  angle  0  between  the  barrel  and  the  y  axis 
is  chosen  at  random  from  uniform  distribution  in  the  range  —n/ 2  <9<  tt/2, 
i.e.,  the  probability  density  of  6  is 

fm  =  -  ■ 

n 


Since 


de  i 

0  —  arctanx  ,  —  = - ^  , 

dx  1  T  x2 

we  find  by  the  transformation  (cf.  Sect.  3.7)  6  — ►  x  of  the  variable  for  the 
probability  density  in  x 


d  0 
dx 


g(x)  = 


fm  = 


71  1  Tx2 


(3.3.31) 
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g(x)  0-4 
A 

0.3 

0.2 

0.1 


(0,-1) 


Fig.  3.6  :  Model  for  producing  a  Cauchy  distribution  (below)  and  probability  density  of  the 
Cauchy  distribution  (above). 


A  distribution  with  this  probability  density  (in  our  example  of  the  position  of 
hits  on  the  x  axis)  is  called  the  Cauchy  distribution. 

The  expectation  value  of  x  is  (taking  the  principal  value  for  the  integral) 


The  expression  for  the  variance, 

2  1  f°°  x2dx  1  x  °° 

jc  g(je)dx  —  —I  - r-  =  —  (jc  —  arctanv) 

TtJ_  col+X2  TV  x=_oo 

2 

=  —  lim  (x  —  arctanv)  , 

71  x^oo 

yields  an  infinite  result.  One  says  that  the  variance  of  the  Cauchy  distribution 
does  not  exist. 

One  can,  however,  construct  another  measure  for  the  width  of  the  dis¬ 
tribution,  the  full  width  at  half  maximum  ‘FWHM’  (cf.  Example  6.3).  The 
function  g(x)  has  its  maximum  at  x  =  x~  =  0  and  reaches  half  its  maximum 
value  at  the  points  xa  —  —  1  and  X(  —  1.  Therefore, 


r  =  2 


is  the  full  width  at  half  maximum  of  the  Cauchy  distribution.  ■ 


3.4  Two  Variables.  Conditional  Probability 
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Example  3.5:  Lorentz  (Breit-Wigner)  distribution 

With  x  =  a  =  0  and  F  =  2  we  can  write  the  probability  density  (3.3.31)  of 
the  Cauchy  distribution  in  the  form 


_  2  r2 

§(x)  -  it  r  4(x  -  a)2  +  r2 


(3.3.32) 


This  function  is  a  normalized  probability  density  for  all  values  of  a  and  full 
width  at  half  maximum  F  >  0.  It  is  called  the  probability  density  of  the 
Lorentz  or  also  Breit-Wigner  distribution  and  plays  an  important  role  in  the 
physics  of  resonance  phenomena.  ■ 


3.4  Distribution  Function  and  Probability  Density 
of  Two  Variables:  Conditional  Probability 

We  now  consider  two  random  variables  x  and  y  and  ask  for  the  probability 
that  both  x  <  x  and  y  <  y.  As  in  the  case  of  a  single  variable  we  expect  there 
to  exist  of  a  distribution  function  (see  Fig.  3.7) 

F(x,y)  =  P(x  <  x,  y  <  y)  .  (3.4.1) 


Fig.  3.7  :  Distribution  function  of  two  variables. 
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We  will  not  enter  here  into  axiomatic  details  and  into  the  conditions  for 
the  existence  of  F,  since  these  are  always  fulfilled  in  cases  of  practical  in¬ 
terest.  If  F  is  a  differentiable  function  of  x  and  y,  then  the  joint  probability 
density  of  x  and  y  is 


3  3 

f(x,y)  =  —  —  F(x,y ) 
ox  oy 


(3.4.2) 


One  then  has 


b  r  nd 


P(a  <  x  <  b,c  <  v  <d)  = 


/  7 

J  a  L  J  c 


f(x,y)dy 


dx 


(3.4.3) 


Often  we  are  faced  with  the  following  experimental  problem.  One  deter¬ 
mines  approximately  with  many  measurements  the  joint  distribution  function 
F(x,y ).  One  wishes  to  find  the  probability  for  x  without  consideration  of  y. 
(For  example,  the  probability  density  for  the  appearance  of  a  certain  infec¬ 
tious  disease  might  be  given  as  a  function  of  date  and  geographic  location. 
For  some  investigations  the  dependence  on  the  time  of  year  might  be  of  no 
interest.) 

We  integrate  Eq.  (3.4.3)  over  the  whole  range  of  y  and  obtain 


P(a  <  x  <  b,  — oo  <  y  <  oo)  = 


where 


nb  r  p oo 

J  a  U-oo 

/oo 

-oo 


f{x,y)dy 


dx  = 


f 


g(x)dx 


g(*)=  /  f(x,y)dy 


(3.4.4) 


is  the  probability  density  for  x.  It  is  called  the  marginal  probability  density 
of  x.  The  corresponding  distribution  for  y  is 

OO 


/OO 

-oo 


h(y)  =  /  f(x,y)dx 


(3.4.5) 


In  analogy  to  the  independence  of  events  [Eq.  (2.3.6)]  we  can  now  define 
the  independence  of  random  variables.  The  variables  x  and  y  are  said  to  be 
independent  if 

f(x,y)  =  g(x)h(y)  .  (3.4.6) 

Using  the  marginal  distributions  we  can  also  define  conditional  probability 
for  y  under  the  condition  that  x  is  known, 

P(y  <  y  <  y  +  dy  |x  <  x  <  x  +  dx)  .  (3.4.7) 

We  define  the  conditional  probability  density  as 

f(x,y ) 


f(y\x)  = 


g(x) 


so  that  the  probability  of  Eq.  (3.4.7)  is  given  by 

f(y\x)dy  . 


(3.4.8) 
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The  rule  of  total  probability  can  now  also  be  expressed  for  distributions: 

/OO  P  OO 

f(x,y)dx  =  f(y\x)g(x)dx  .  (3.4.9) 

-oo  J —oo 


In  the  case  of  independent  variables  as  defined  by  Eq.  (3.4.6)  one  obtains  di¬ 
rectly  from  Eq.  (3.4.8) 


f(y\x) 


f(x,y ) 

g(x) 


g(x)h(y) 

g(x) 


(3.4.10) 


This  was  expected  since,  in  the  case  of  independent  variables,  any  constraint 
on  one  variable  cannot  contribute  information  about  the  probability  distribu¬ 
tion  of  the  other. 


3.5  Expectation  Values,  Variance,  Covariance, 
and  Correlation 

In  analogy  to  Eq.  (3.3.5)  we  define  the  expectation  value  of  a  function  H(x,  y) 
to  be 

/OO  P  OO 

/  H(x,y)f(x,y)dxdy  .  (3.5.1) 

-oo  J —oo 

Similarly,  the  variance  of  H(x,  y)  is  defined  to  be 

cr2{H(x,  y)}  =  E{[H(x,  y)  -  E(H(x,  y))]2}  .  (3.5.2) 

For  the  simple  case  H(x,  y)  =  ax  +  by,  Eq.  (3.5.1)  clearly  gives 

E(ax  +  by)  —  aE(x)  +  bE(y)  .  (3.5.3) 

We  now  choose 

H(x,y)=xiym  (l,  m  non- negative  integers)  .  (3.5.4) 

The  expectation  values  of  such  functions  are  the  l mth  moments  of  x,  y  about 
the  origin, 

kim  =  E{xlym)  .  (3.5.5) 

If  we  choose  more  generally 

H  (x,  y)  =  (x  —  a)1  (y  —  b)m  ,  (3.5.6) 


the  expectation  values 


alm  =  E{(x-a)\y-b)m} 


(3.5.7) 
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are  the  im- th  moments  about  the  point  a,  b.  Of  special  interest  are  the  mo¬ 
ments  about  the  point  Mo,  Ml, 

film  =  E{(x-k  10)£(y-A0i)w}  •  (3.5.8) 

As  in  the  case  of  a  single  variable,  the  lower  moments  have  a  special  signifi¬ 
cance,  in  particular, 

^  oo  =  Mo  =  1  , 

AO  o  —  AA)i  =  0  ; 


Mo  -  E(X)  =  x  , 

Ml  =  E(y)=f  ;  (3.5.9) 


/tn  =  £{(x-*)(y-y)}  =cov(x,y)  , 
At 2 o  -  E{(x  —  x)2}  —  a2(x)  , 

At02  =  E{(y-y)2}  =  a2(y)  . 


We  can  now  express  the  variance  of  ax  +  by  in  terms  of  these  quantities: 


a2  (ax  +  by) 


o2(ax  +  by) 


E{[(ax  +  by)- E(ax  +  by)]2} 

E{[a(x-x)  +  b(y-y)]2} 

E  {a2(x  -  x)2  +  b2(y  -  y)2  +  2 ab(x  -  x) (y  -  y)}  , 

(3.5.10) 

a2a2(x)  +  b2cr2(y)  +  2ab  cov(x,y)  . 


In  deriving  (3.5.10)  we  have  made  use  of  (3.3.14).  As  another  example  we 
consider 

H(x,  y)  =  xy  .  (3.5.11) 

In  this  case  we  have  to  assume  the  independence  of  x  and  y  in  the  sense 
of  (3.4.6)  in  order  to  obtain  the  expectation  value.  Then  according  to  (3.5.1) 
one  has 


or 


/OO  P  OO 

-oo  J - 


E(xy)  = 


xy  g(x)h(y)dxdy 


—  OO  J  — OO 
OO 


(f 


jcg(x)cbc 


yh(y)dy 


(3.5.12) 


E(x  y)  =  E(x)E( y) 


(3.5.13) 


While  the  quantities  E(x),  E(y),  cr2(x),  a2( y)  are  very  similar  to  those 
obtained  in  the  case  of  a  single  variable,  we  still  have  to  explain  the  meaning 
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of  cov(x,  y).  The  concept  of  covariance  is  of  considerable  importance  for  the 
understanding  of  many  of  our  subsequent  problems.  From  its  definition  we 
see  that  cov(x,  y)  is  positive  if  values  x  >  x  appear  preferentially  together 
with  values  y  >  y.  On  the  other  hand,  cov(x,  y)  is  negative  if  in  general  x  > 
x  implies  y  <  y.  If,  finally,  the  knowledge  of  the  value  of  x  does  not  give 
us  additional  information  about  the  probable  position  of  y,  the  covariance 
vanishes.  These  cases  are  illustrated  in  Fig.  3.8. 

It  is  often  convenient  to  use  the  correlation  coefficient 


p(x,y) 


cov(x,  y) 
o'  (x)er  (y) 


(3.5.14) 


rather  than  the  covariance. 

Both  the  covariance  and  the  correlation  coefficient  offer  a  (necessar¬ 
ily  crude)  measure  of  the  mutual  dependence  of  x  and  y.  To  investigate 
this  further  we  now  consider  two  reduced  variables  u  and  v  in  the  sense  of 
Eq.  (3.3.17)  and  determine  the  variance  of  their  sum  by  using  (3.5.9), 


er2(U  +  V)  =  a2(u) +  a2(v) +  2p(U,  V)o-(u)cr(v)  .  (3.5.15) 


(a)  (b)  (c) 

Fig.  3.8  :  Illustration  of  the  covariance  between  the  variables  x  and  y.  (a)  cov(x,  y)  >  0; 
(b)  cov(x,  y)  ~  0;  (c)  cov(x,  y)  <  0. 


From  Eq.  (3.3.19)  we  know  that  cr2(u)  =  cr2(v)  =  1.  Therefore  we  have 

cr2(u  +  v)  =  2(1  +p(u,  v))  (3.5.16) 

and  correspondingly 

a2(u  — v)  =2(1— p(u,v))  .  (3.5.17) 

Since  the  variance  always  fulfills 

a2  >  0 


(3.5.18) 
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it  follows  that 

—  1  <  p(u,  v)  <  1  .  (3.5.19) 

If  one  now  returns  to  the  original  variables  x,  y,  then  it  is  easy  to  show  that 

p(u,  v)  =  p(x,  y)  .  (3.5.20) 

Thus  we  have  finally  shown  that 

-  1  <  p(x,y)  <  1  .  (3.5.21) 

We  now  investigate  the  limiting  cases  ±1.  For  p(u,  v)  =  1  the  variance 
is  cr(u  —  v)  =  0,  i.e.,  the  random  variable  (u  —  v)  is  a  constant.  Expressed  in 
terms  of  x,  y  one  has  therefore 

x  —  x  y-y 

u  —  v  = - =  const  .  (3.5.22) 

a(x)  <r(y) 

The  equation  is  always  fulfilled  if 

y  —  a  +  bx  ,  (3.5.23) 


where  b  is  positive.  Therefore  in  the  case  of  a  linear  dependence  ( b  posi¬ 
tive)  between  x  and  y  the  correlation  coefficient  takes  the  value  p  (x,  y)  =  + 1 . 
Correspondingly  one  finds  p  (x,  y)  =  —  1  for  a  negative  linear  dependence  (b 
negative).  We  would  expect  the  covariance  to  vanish  for  two  independent  vari¬ 
ables  x  and  y,  i.e.,  for  which  the  probability  density  obeys  Eq.  (3.4.6).  Indeed 
with  (3.5.9)  and  (3.5.1)  we  find 


/CO  nO O 

/  (x -T)(y -y)g(x)h(y)dxdy 

-co  J —co 


—  CO  J  —OQ 
OO 


(f 


(x  —x 


)g(x)dx\(^J  (y-y)h(y)dy\ 


=  0 


3.6  More  than  Two  Variables:  Vector  and  Matrix  Notation 


In  analogy  to  (3.4.1)  we  now  define  a  distribution  function  of  n  variables 
*i ,  X2, . . . ,  xM . 

F(xx 

}  2  9***5  Yl  )  =  i>(x  1  <  x\,  X2  <  X2,  ...,xn  <  xn)  .  (3.6.1) 


3.6  More  than  Two  Variables:  Vector  and  Matrix  Notation 
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If  the  function  F  is  differentiable  with  respect  to  the  .r,- ,  then  the  joint  proba¬ 
bility  density  is  given  by 

dn 

fix  i,X2,...,xn)  =  - — - - — Fix  i,x2,...,xn)  .  (3.6.2) 

dvi  ox 2  •  • • oxn 

The  probability  density  of  one  of  the  variables  xr,  the  marginal  probability 
density,  is  given  by 

/oo  poo 

•I  fix\,  x2,  ■  ■  ■ ,  xn)dx\  ■  ■  -dxr_i  dxr+i  •  •  -dxn  .  (3.6.3) 

-oo  J — oo 

If  H(x i ,  X2, . . . ,  x„)  is  a  function  of  n  variables,  then  the  expectation  value  of 
H  is 


E{H(x  i,x2,...,x„)} 

/CO  poo 

•••  /  H  (x\,  X2,  ■  ■  ■ ,  x„)  f  (x\,  X2,  ■  ■  ■ ,  x„)  dxi  ■  ■  ■  dx 

-co  J —oo 


(3.6.4) 


With  H  (x)  =  xr  one  obtains 


EiXr)  = 


/OO  /»( 

-oo  J  —oo 
oo 


xrfix  1,X2,  .  . .  ,  xn)dx\  ■  ■  dx 


n  > 


f 

F  (Xr )  —  J  xrgr  (xr )  dx, 

J —oo 


(3.6.5) 


The  variables  are  independent  if 


fixi,x2,...,xn)  =  giixi)g2ix2)---gnixn)  ■  (3.6.6) 

In  analogy  to  Eq.  (3.6.3)  one  can  define  the  joint  marginal  probability  den¬ 
sity  of  £  out  of  the  n  variables,*  by  integrating  (3.6.3)  over  only  the  n  —  t 
remaining  variables, 

/oo  poo 

■  fix  i,x2,...,xn)dxe+i---dxn  .  (3.6.7) 

-co  J — OO 

These  t  variables  are  independent  if 

gix\,x2,...,xi)  =  gi(xi)g2(x2)---gi(xi)  .  (3.6.8) 


The  moments  of  order  i\,t2,...,in  about  the  origin  are  the  expectation  val¬ 
ues  of  the  functions 


TJ  ^2 

H  —  X1  X2 


X 


'Yl 


n 


5 


*  Without  loss  of  generality  we  can  take  these  to  be  the  variables  Xi ,  X2, . . . ,  x^. 
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that  is, 


In  particular  one  has 

II 

q 

d 

o 

H 

£(xO=xi  , 

>> 

o 

o 

..  b 

II 

E  (x2)  =  x2  , 

• 

• 

^•ooo...  i  — 

E  (Xjj)  —  xn 

The  moments  about  (x) ,  x2 , . . . ,  x„ )  are  correspondingly 

=  £{(Xi  -  xi)£l  (x2  -  x2/2  •  •  •  (xM  -  xn)4 } 


(3.6.9) 


(3.6.10) 


The  variances  of  the  x,  are  then 

At 200... 0  =  £"{(Xl  —  Xi)2}  =  cr2(Xi)  , 

M-020...0  =  £{(*2  —  ^2)2}  =  0r2(x2)  ,  (3.6.11) 


At 000... 2 


E  { (x„  xn )  }  —  o'  (x„ ) 


The  moment  with  €/  =  A  y  =  1 ,  =  0  (/  /  A  /  /  )  is  called  the  covariance 

between  the  variables  x,  and  x; , 


c,j  =cov(x,,x/)  =  £{(X/ 


(3.6.12) 


It  proves  useful  to  represent  the  n  variables  X| ,  x2, . . . ,  x„  as  components  of  a 
vector  x  in  an  n -dimensional  space.  We  can  then  write  the  distribution  func¬ 
tion  (3.6.1)  as 

F  —  F(x)  .  (3.6.13) 

Correspondingly,  the  probability  density  (3.6.2)  is  then 


/(x) 


dn 

9xi3x2  • •  •  dxn 


F  (x) 


(3.6.14) 


The  expectation  value  of  a  function  H  (x)  is  then  simply 

E{H(x)}  =  /  H(x)f(x)dx  .  (3.6.15) 


We  would  now  like  to  express  the  variances  and  covariances  by  means  of  a 
matrix.  This  is  called  the  covariance  matrix 


Tor  details  on  matrix  notation  see  Appendix  A. 


3.7  Transformation  of  Variables 


Cln  ^ 

t'2n 
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C 12 
C22 


\  Cn\  Cn2 


nn 


(3.6.16) 


The  elements  are  given  by  (3.6. 12);  the  diagonal  elements  are  the  variances 
cu  =  cr2(x,  ).  The  covariance  matrix  is  clearly  symmetric,  since 


Cij=cji  .  (3.6.17) 

If  we  now  also  write  the  expectation  values  of  the  x,  as  a  vector, 

E(x)  =  x  ,  (3.6.18) 


we  see  that  each  element  of  the  covariance  matrix 


is  given  by  the  expectation  value  of  the  product  of  the  row  vector  (x  —  x) 1  and 
the  column  vector  (x  —  x) ,  where 


(x\ ,  X2 ,  •  •  •  ,  Xfi ) 


y  %n  J 

The  covariance  matrix  can  therefore  be  written  simply  as 

C  —  £{(x — ^)(x  —  x)T}  • 


(3.6.19) 


3.7  Transformation  of  Variables 

As  already  mentioned  in  Sect.  3.3,  a  function  of  a  random  variable  is  itself  a 
random  variable,  e.g., 

y  =  y(x)  . 

We  now  ask  for  the  probability  density  g(y)  for  the  case  where  the  probability 
density  fix)  is  known. 

Clearly  the  probability 

g(y)dy 

that  y  falls  into  a  small  interval  dy  must  be  equal  to  the  probability  f(x)dx 
that  x  falls  into  the  “corresponding  interval”  dx ,  f(x)dx  =  g(y)dy.  This  is 
illustrated  in  Fig.  3.9.  The  intervals  dv  and  dy  are  related  by 
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Fig.  3.9  :  Transformation  of  vari¬ 
ables  for  a  probability  density  of 
v  to  y. 


The  absolute  value  ensures  that  we  consider  the  values  dx,  d  v  as  intervals 
without  a  given  direction.  Only  in  this  way  are  the  probabilities  fix)  dv  and 
g(x)  d y  always  positive.  The  probability  density  is  then  given  by 


(3.7.1) 


We  see  immediately  that  g(y )  is  defined  only  in  the  case  of  a  single-valued 
function  y(x)  since  only  then  is  the  derivative  in  (3.7.1)  uniquely  defined. 
For  functions  where  this  is  not  the  case,  e.g.,  y  =  *Jx,  one  must  consider  the 
individual  single-valued  parts  separately,  i.e.,  y  =  +*/x.  Equation  (3.7.1)  also 
guarantees  that  the  probability  distribution  of  y  is  normalized  to  unity: 

/CO  /*  CO 

g(y)dy=  f(x)dx  =  1  . 

-CO  J  —  oo 


In  the  case  of  two  independent  variables  x,  y  the  transformation  to  the  new 
variables 

u  =  u(x,  y)  ,  v  =  v(x,  y)  (3.7.2) 

can  be  illustrated  in  a  similar  way.  One  must  find  the  quantity  J  that  relates 
the  probabilities  /(x,  y)  and  g{u,  v): 


g(u,v )  =  /(x,y) 


J 


x,y 


u,  v 


(3.7.3) 


Figure  3.10  shows  in  the  (x,  y)  plane  two  lines  each  for  u  =  const  and  v  = 
const.  They  bound  the  surface  element  dA  of  the  transformed  variables  u,  v 
corresponding  to  the  element  dx  dy  of  the  original  variables. 


3.7  Transformation  of  Variables 
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Fig.  3.10:  Variable  transformation 
from  x,  y  to  u ,  v. 


These  curves  of  course  need  not  be  straight  lines.  Since,  however,  dA  is 
an  “infinitesimal”  surface  element,  it  can  be  treated  as  a  parallelogram,  whose 
area  we  will  now  compute.  The  coordinates  of  the  corner  points  a ,  b,  c  are 

Xa=x(u,v)  ,  ya=y(U,v)  , 

Xb  —  x(u,  v  +  du)  ,  yb  —  y(u,  v  +  du)  , 

xc=x(u  +  du,v)  ,  yc  =  y(u  +  du,v) 


We  can  expand  the  last  two  lines  in  a  series  and  obtain 


Xb  =  x(u,  v)  + 


xc  —  x(u,  v)  + 


yb-y(u,v)  + 
yc-y(u,v)  + 


di> 

dn 


The  area  of  the  parallelogram  is  equal  to  the  absolute  value  of  the  determinant 


1  %a  ya 

1  xb  yb 

1  *c  Jc 


dx 

3  u 

dx 

dv 


dudv  =  J 


x,y 


U,  V 


dudv 


(3.7.4) 


where  the  sign  is  of  no  consequence  because  of  the  absolute  value  in  Eq.  (3.7.3). 


The  expression 


/ 


x,y 


U,  V 


dx  dy 
du  du 

dx_  dy_ 
dv  dv 


(3.7.5) 


is  called  the  Jacobian  ( determinant )  of  the  transformation  (3.7.2). 

For  the  general  case  of  n  variables  x  =  (xi,  X2, . . . ,  x„)  and  the  transfor¬ 
mation 


yi  =  yi(x)  , 

Y2  =  Y2(X)  , 


(3.7.6) 
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the  probability  density  of  the  transformed  variables  is  given  by 


g(y)  = 


where  the  Jacobian  is 


-  X\  ,  / -tl?  X2 1  i  -t/i 

y)  \yi.  ^2 . 


x\ 

/(x)  , 

(3.7.7) 

y  / 

3xi 

3X2 

dxn 

3yi 

9>’i 

dy\ 

dx\ 

3X2 

dxn 

dy2 

• 

• 

3y2 

dy2 

(3.7.8) 

• 

dx\ 

3X2 

dxn 

dyn 

dy„ 

3  yn 

y)  is  of  course  again  the  uniqueness  of 

all  derivatives  occurring  in  J . 


3.8  Linear  and  Orthogonal  Transformations: 
Error  Propagation 


In  practice  we  deal  frequently  with  linear  transformations  of  variables.  The 
main  reason  is  that  they  are  particularly  easy  to  handle,  and  we  try  therefore 
to  approximate  other  transformations  by  linear  ones  using  Taylor  series  tech¬ 
niques. 

Consider  r  linear  functions  of  the  n  variables  x  =  (Xi ,  X2, . . . ,  X„): 


Yi  —  +  tnXi  +  £12X2  4 - \-t\ nxn  , 

Y2  =  a2  +  fziXl  +  (22*2  +  '  ’ '  +  Cn^n  ,  (3.8.1) 

yr  =  ar  +  tr\X\  +  ?r2x2  H - b  trny.n  , 

or  in  matrix  notation, 

y  =  rx  +  a  .  (3.8.2) 

The  expectation  value  of  y  follows  from  the  generalization  of  (3.5.3) 

E  (y)  =  y  =  Tx  +  a  .  (3.8.3) 


Together  with  (3.6.19)  one  obtains  the  covariance  matrix  for  y, 

cy  =  £{(y-y)(y-y)T} 

=  £{(Tx  +  a  — Tx  — a)(Tx  +  a— Tx  — a)T} 
=  £{T(x-x)(x-x)TrT} 

=  r£{(x-x)(x-x)T}rT  , 

Cy  =  TCxTT  . 


(3.8.4) 


3.8  Linear  Transformation:  Error  Propagation 
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Equation  (3.8.4)  expresses  the  well-known  law  of  error  propagation.  Suppose 
the  expectation  values  x]  have  been  measured.  Suppose  as  well  that  the  errors 
(i.e.,  the  standard  deviations  or  variances)  and  the  covariances  of  x  are  known. 
One  would  like  to  know  the  errors  of  an  arbitrary  function  y(x).  If  the  errors 
are  relatively  small,  then  the  probability  density  is  only  significantly  large  in  a 
small  region  (on  the  order  of  the  standard  deviation)  around  the  point  x.  One 
then  performs  a  Taylor  expansion  of  the  functions, 


(Xi-Xi)  +  ---  + 

X=X 


JXn 

x=x 


+  higher-order  terms 


or  in  matrix  notation, 


y  =  y(x)  +  T (x  —  x)  +  higher-order  terms  (3.8.5) 


with 


dy  i 

dy\ 

dyi 

3xi 

dX2 

dxn 

dyi 

d}>2 

dy2 

dx\ 

dX2 

dxn 

dyr 

d)’r 

9yr 

dx\ 

dX2 

dxn 

\ 


X=X 


(3.8.6) 


If  one  neglects  the  higher-order  terms  and  substitutes  the  first  partial  deriva¬ 
tives  of  the  matrix  T  into  Eq.  (3.8.4),  then  one  obtains  the  law  of  error  prop¬ 
agation.  We  see  in  particular  that  not  only  the  errors  (i.e.,  the  variances)  of 
x  but  also  the  covariances  make  a  significant  contribution  to  the  errors  of  y, 
that  is,  to  the  diagonal  elements  of  Cy.  If  these  are  not  taken  into  account  in 
the  error  propagation,  then  the  result  cannot  be  trusted. 

The  covariances  can  only  be  neglected  when  they  vanish  anyway,  i.e.,  in 
the  case  of  independent  original  variables  x.  In  this  case  Cx  simplifies  to  a 
diagonal  matrix.  The  diagonal  elements  of  Cy  then  have  the  simple  form 


(3.8.7) 


If  we  now  call  the  standard  deviation,  i.e.,  the  positive  square  root  of  the 
variance,  the  error  of  the  corresponding  quantity  and  use  for  this  the  symbol 
A,  Eq.  (3.8.7)  leads  immediately  to  the  formula 


(3.8.8) 
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known  commonly  as  the  law  of  the  propagation  of  errors.  It  must  be  empha¬ 
sized  that  this  expression  is  incorrect  in  cases  of  non- vanishing  covariances. 
This  is  illustrated  in  the  following  example. 


Example  3.6:  Error  propagation  and  covariance 


In  a  Cartesian  coordinate  system  a  point  (x,  y )  is  measured.  The  measurement 
is  performed  with  a  coordinate  measuring  device  whose  error  in  y  is  three 
times  larger  than  that  in  x .  The  measurements  of  x  and  y  are  independent.  We 
therefore  can  take  the  covariance  matrix  to  be  (up  to  a  factor  common  to  all 
elements) 


We  now  evaluate  the  errors  (i.e.,  the  covariance  matrix)  in  polar  coordinates 


r  =  J  (x2  +  y1)  , 


cp  —  arctan  — 

x 


The  transformation  matrix  (3.8.6)  is 


To  simplify  the  numerical  calculations  we  consider  only  the  point  (1, 1).  Then 

l  l 

V2  V2 

_  1  1 
2  2 

and  therefore 


We  can  now  return  to  the  original  Cartesian  coordinate  system 


x  =  rcoscp  , 


y  —  r  sin  cp 


by  use  of  the  transformation 


cos  cp  —r  sirup 
sirup  rcos(p 
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As  expected  we  obtain 


If  instead  we  had  used  the  formula  (3.8.8),  i.e.,  if  we  had  neglected  the  co- 
variances  in  the  transformation  of  r,  (p  to  x,  y,  then  we  would  have  obtained 


which  is  different  from  the  original  covariance  matrix.  This  example  stresses 
the  importance  of  covariances,  since  it  is  obviously  not  possible  to  change 
errors  of  measurements  by  simply  transforming  back  and  forth  between  coor¬ 
dinate  systems.  ■ 

Finally  we  discuss  a  special  type  of  linear  transformation.  We  consider 
the  case  of  exactly  n  functions  y  of  the  n  variables  x.  In  particular  we  take 
a  =  0  in  (3.8.2).  One  then  has 


y  —  Rx  ,  (3.8.9) 

where  R  is  a  square  matrix.  We  now  require  that  the  transformation  (3.8.9) 
leaves  the  modulus  of  a  vector  invariant 

y2  =  X>2  =  x2  =  Ex2  •  as.io) 

i=l  i= 1 

Using  Eq.  (A.  1.9)  we  can  write 

yTy  -  ( Rx)T(Rx )  -  xTRTRx  =  xTx  . 

This  means 

RtR  =  I  , 

or  written  in  terms  of  components, 

n 

2_jrikru  ~  ^ u  ~ 

i  —  \ 

A  transformation  of  the  type  (3.8.9)  that  fulfills  condition  (3.8.11)  is  said  to 
be  orthogonal.  We  now  consider  the  determinant  of  the  transformation  matrix 


0 ,  i  ^  k 
1,  t  =  k 


(3.8.11) 
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r\\ 

r\ 2 

■■■  r\n 

ri\ 

• 

1'22 

■■■  7"2w 

• 

I'n  1 

rn  2 

• ' ‘  rnn 

and  form  its  square.  According  to  rules  for  computing  determinants  we  obtain 
from  Eq.  (3.8.1 1) 

1  0  ...  0 

0  1  •  •  •  0 


0 


i.e.,  D  =  ±1.  The  determinant  D,  however,  is  the  Jacobian  determinant  of  the 
transformation  (3.8.9), 

/Q=±l  .  (3.8.12) 

We  multiply  the  system  of  equations  (3.8.9)  on  the  left  with  RT  and  obtain 


RTy  =  RtRx 


Because  of  Eq.  (3.8.1 1)  this  expression  reduces  to 

x  =  RTy  .  (3.8.13) 


The  inverse  transformation  of  an  orthogonal  transformation  is  described  sim¬ 
ply  by  the  transposed  transformation  matrix.  It  is  itself  orthogonal. 

An  important  property  of  any  linear  transformation  of  the  type 


yi  —  nixi  +742X2  3 - bn  nx„ 

is  the  following.  By  constructing  additional  functions  y2,  Ys,  ■  ■  ■ ,  W  of  equiv¬ 
alent  form  it  can  be  extended  to  yield  an  orthogonal  transformation  as  long  as 
the  condition 


n 


i— 1 


is  fulfilled. 
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Computer  Generated  Random  Numbers: 
The  Monte  Carlo  Method 


4.1  Random  Numbers 

Up  to  now  in  this  book  we  have  considered  the  observation  of  random 
variables,  but  not  prescriptions  for  producing  them.  It  is  in  many  applications 
useful,  however,  to  have  a  sequence  of  values  of  a  randomly  distributed  vari¬ 
able  x.  Since  operations  must  often  be  carried  out  with  a  large  number  of  such 
random  numbers,  it  is  particularly  convenient  to  have  them  directly  available 
on  a  computer.  The  correct  procedure  to  create  such  random  numbers  would 
be  to  use  a  statistical  process,  e.g.,  the  measurement  of  the  time  between  two 
decays  from  a  radioactive  source,  and  to  transfer  the  measured  results  into  the 
computer.  In  practical  applications,  however,  the  random  numbers  are  almost 
always  calculated  directly  by  the  computer.  Since  this  works  in  a  strictly 
deterministic  way,  the  resulting  values  are  not  really  random,  but  rather  can 
be  exactly  predicted.  They  are  therefore  called  pseudorandom. 

Computations  with  random  numbers  currently  make  up  a  large  part  of 
all  computer  calculations  in  the  planning  and  evaluation  of  experiments.  The 
statistical  behavior  which  stems  either  from  the  nature  of  the  experiment  or 
from  the  presence  of  measurement  errors  can  be  simulated  on  the  computer. 
The  use  of  random  numbers  in  computer  programs  is  often  called  the  Monte 
Carlo  method. 

We  begin  this  chapter  with  a  discussion  of  the  representation  of  numbers 
in  a  computer  (Sect.  4.2),  which  is  indispensable  for  an  understanding  of  what 
follows.  The  best  studied  method  for  the  creation  of  uniformly  distributed  ran¬ 
dom  numbers  is  the  subject  of  Sects.  4. 3-4.7.  Sections  4.8  and  4.9  cover  the 
creation  of  random  numbers  that  follow  an  arbitrary  distribution  and  the  espe¬ 
cially  common  case  of  normally  distributed  numbers.  In  the  last  two  sections 
one  finds  discussion  and  examples  of  the  Monte  Carlo  method  in  applications 
of  numerical  integration  and  simulation. 
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In  many  examples  and  exercises  we  will  simulate  measurements  with 
the  Monte  Carlo  method  and  then  analyze  them.  We  possess  in  this  way  a 
computer  laboratory ,  which  allows  us  to  study  individually  the  influence  of 
simulated  measurement  errors  on  the  results  of  an  analysis. 

4.2  Representation  of  Numbers  in  a  Computer 

For  most  applications  the  representation  of  numbers  used  in  a  computation 
is  unimportant.  It  can  be  of  decisive  significance,  however,  for  the  proper¬ 
ties  of  computer-generated  random  numbers.  We  will  restrict  ourselves  to  the 
binary  representation,  which  is  used  today  in  practically  all  computers.  The 
elementary  unit  of  information  is  the  bit*  which  can  assume  the  values  of 
0  or  1 .  This  is  realized  physically  by  two  distinguishably  different  electric  or 
magnetic  states  of  a  component  in  the  computer. 

If  one  has  k  bits  available  for  the  representation  of  an  integer,  then  1  bit  is 
sufficient  to  encode  the  sign.  The  remaining  k  —  1  bits  are  used  for  the  binary 
representation  of  the  absolute  value  in  the  form 

a  =  a(k-2)2k-2  +  a(k-3)2k-3 +  ---  +  a(1)21 +a(0)2°  .  (4.2.1) 

Here  each  of  the  coefficients  aS^  can  assume  only  the  values  0  or  1,  and  thus 
can  be  represented  by  a  single  bit. 

The  binary  representation  for  non-negative  integers  is 

00- -000  =  0 
00- -001  =  1 
00---010  =  2 
00- • -Oil  =  3 


One  could  simply  use  the  first  bit  to  encode  the  sign  and  represent  the  corre¬ 
sponding  negative  numbers  such  that  in  the  first  bit  the  0  is  replaced  by  a  1 . 
That  would  give,  however,  two  different  representations  for  the  number  zero, 
or  rather  +0  and  —0.  In  fact,  one  uses  for  negative  numbers  the  “complemen¬ 
tary  representation” 

11-111  =  -1 
11-110  =  -2 
11-101  =  -3 


*  Abbreviation  of  binary  digit. 
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Then  using  k  bits,  integers  in  the  interval 

-2k~l  <x  <  2k~x  -  1  (4.2.2) 


can  be  represented. 

In  most  computers  8  bit  are  grouped  together  into  one  byte.  Four  bytes 
are  generally  used  for  the  representation  of  integers,  i.e.,  k  =  32,  2k~x  —  1  = 
2 147  483  647.  In  many  small  computers  only  two  bytes  are  available,  k  =  16, 
2k~l  —  1  =  32767 .  This  constraint  (4.2.2)  must  be  taken  into  consideration 
when  designing  a  program  to  generate  random  numbers. 

Before  turning  to  the  representation  of  fractional  numbers  in  a  com¬ 
puter,  let  us  consider  a  finite  decimal  fraction,  which  we  can  write  in  various 
ways,  e.g., 

x  =  17.23  =  0.1723  •  102 


or  in  general 


x  =  M  -I0e 


The  quantities  M  and  e  are  called  the  mantissa  and  exponent ,  respectively. 
One  chooses  the  exponent  such  that  the  mantissa’s  nonzero  digits  are  all  to 
the  right  of  the  decimal  point,  and  the  first  place  after  the  decimal  point  is  not 
zero.  If  one  has  available  n  decimal  places  for  the  representation  of  the  value 
M,  then 

m  =  M  ■  10” 


is  an  integer.  In  our  example,  n  =  4  and  m  =  1723.  In  this  way  the  decimal 
fraction  d  is  represented  by  the  two  integers  m  and  e. 

The  representation  of  fractions  in  the  binary  system  is  done  in  a  com¬ 
pletely  analogous  way.  One  decomposes  a  number  of  the  form 

x  =  M-2e  (4.2.3) 

into  a  mantissa  M  and  exponent  e.  If  nm  bits  are  available  for  the  representa¬ 
tion  of  the  mantissa  (including  sign),  it  can  be  expressed  by  the  integer 

m  —  M  ■  2”m_1  ,  -2”m_1  <m<  2n”~x  -  1  .  (4.2.4) 

If  the  exponent  with  its  sign  is  represented  by  ne  bits,  then  it  can  cover  the 
interval 

-  2ne  <  e  <  2ne  -  1  .  (4.2.5) 

In  our  Java  classes  we  use  floating-point  numbers  of  the  type  double  with 
64  bit,  nm  =  53  for  the  mantissa  and  ne  =  11  for  the  exponent. 
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For  the  interval  of  values  in  which  a  floating  point  number  can  be 
represented  in  a  computer,  the  constraint  (4.2.2)  no  longer  applies  but  one 
has  rather  the  weaker  condition 


< 


2^max 


(4.2.6) 


Here  emin  and  emax  are  given  by  (4.2.5).  If  1 1  bit  are  available  for  representing 
the  exponent  (including  sign),  then  one  has  emax  =  210  —  1  =  1023.  Therefore, 
one  has  the  constraint  |x|  <  2 1023  ~  10308. 

When  computing  with  floating  point  numbers,  the  concept  of  the  relative 
precision  of  the  representation  is  of  considerable  significance.  There  are  a 
fixed  number  of  binary  digits  corresponding  to  a  fixed  number  of  decimal 
places  available  for  the  representation  of  the  mantissa  M.  If  we  designate  by 
a  the  smallest  possible  mantissa,  then  two  numbers  x\  and  X2  can  still  be 
represented  as  being  distinct  if 

x\—x  —  M-2e  ,  X2  —  (M  +  a)-2e 


The  absolute  precision  in  the  representation  of  x  is  thus 

Ax  —  x\— X2  — a -2e  , 


which  depends  on  the  exponent  of  x.  The  relative  precision 

Ax  a 
— 

is  in  contrast  independent  of  x.  If  n  binary  digits  are  available  for  the  represen¬ 
tation  of  the  mantissa,  then  one  has  M  «  2n,  since  the  exponent  is  chosen  such 
that  all  n  places  for  the  mantissa  are  completely  used.  The  smallest  possible 
mantissa  is  a  =  2°,  so  that  the  relative  precision  in  the  representation  of  x  is 


Ax 

—  =  2~n  .  (4.2.7) 

x 


4.3  Linear  Congruential  Generators 

Since,  as  mentioned,  computers  work  in  a  strictly  deterministic  way,  all 
(pseudo)"-random  numbers  generated  in  a  computer  are  in  the  most  general 
case  a  function  of  all  of  the  preceding  (pseudo)random  numbers 

Xj+I  =  f(xj,Xj-i,...,xi)  .  (4.3.1) 

Programs  for  creating  random  numbers  are  called  random  number  generators. 

1  Since  the  numbers  are  pseudorandom  and  not  strictly  random,  we  use  the  notation  x  in 
place  of  x. 


4.4  Multiplicative  Linear  Congruential  Generators 
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The  best  studied  algorithm  is  based  on  the  following  rule, 

xj+\  —  (a  xj  +  c )  mod  m 


(4.3.2) 


All  of  the  quantities  in  (4.3.2)  are  integer  valued.  Generators  using  this  rule  are 
called  linear  congruential  generators  (LCG).  The  symbol  mod  m  or  modulo 
m  in  (4.3.2)  means  that  the  expression  before  the  symbol  is  divided  by  m  and 
only  the  remainder  of  the  result  is  taken,  e.g.,  6  mod  5  =  1.  Each  random 
number  made  by  an  LCG  according  to  the  rule  (4.3.2)  depends  only  on  the 
number  immediately  preceding  it  and  on  the  constant  a  (the  multiplier ),  on  c 
(the  increment),  and  on  m  (the  modulus).  When  these  three  constants  and  one 
initial  value  xo  are  given,  the  infinite  sequence  of  random  numbers  xo,  x  \  , 
is  determined. 

The  sequence  is  clearly  periodic.  The  maximum  period  length  is  m.  Only 
partial  sequences  that  are  short  compared  to  the  period  length  are  useful  for 
computations. 

Theorem  on  the  maximum  period  of  an  LCG  with  0: 

An  LCG  defined  by  the  values  m,  a,  c,  and  xo  has  the  period  m 

if  and  only  if 

(a)  c  and  m  have  no  common  factors; 

(b)  b  =  a  —  1  is  a  multiple  of  p  for  every  prime  number  p  that 

is  a  factor  of  m; 

(c)  b  is  a  multiple  of  4  if  m  is  a  multiple  of  4. 

The  proof  of  this  theorem  as  well  as  the  theorems  of  Sect.  4.4 

can  be  found  in,  e.g.,  Knuth  [2], 

A  simple  example  isc  =  3,a  =  5,m  =  16.  One  can  easily  compute  that 
xo  =  0  results  in  the  sequence 


Since  the  period  m  can  only  be  attained  when  all  m  possible  values  are  actu¬ 
ally  assumed,  the  choice  of  the  initial  value  xo  is  unimportant. 

4.4  Multiplicative  Linear  Congruential  Generators 

If  one  chooses  c  —  0  in  (4.3.2),  then  the  algorithm  simplifies  to 


Xj+i  =  (ax  j)  mod  m 


(4.4.1) 


Generators  based  on  this  rule  are  called  multiplicative  linear  congruential 
generators  (MLCG).  The  computation  becomes  somewhat  shorter  and  thus 
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faster.  The  exact  value  zero,  however,  can  no  longer  be  produced  (except 
for  the  unusable  sequence  0,  0,  . . .).  In  addition  the  period  becomes  shorter. 
Before  giving  the  theorem  on  the  maximum  period  length  for  this  case,  we 
introduce  the  concept  of  the  primitive  element  modulo  m . 

Let  a  be  an  integer  having  no  common  factors  (except  unity)  with  m. 
We  consider  all  a  for  which  ax  mod  m  —  1  for  integer  A.  The  smallest  value  of 
A  for  which  this  relation  is  valid  is  called  the  order  of  a  modulo  m .  All  values 
a  having  the  same  largest  possible  order  A  (m)  are  called  primitive  elements 
modulo  m . 


Theorem  on  the  order  A(m)  of  a  primitive  element  modulo  m: 

For  every  integer  e  and  prime  number  p 


A  (2)  =  1  ; 

A  (4)  =  2  ; 

A(2e)  =  2e~2  ,  e>2  ■ 

A (pe)  =  pe~l  (p-  1)  ,  p  >  2 


(4.4.2) 


Theorem  on  primitive  elements  modulo  pe :  The  number  a 
is  a  primitive  element  modulo  pe  if  and  only  if 

a  odd  ,  pe  =  2  ; 
a  mod  4  =  3,  pe  —  4  ; 
a  mod  8  =  3, 5, 7  ,  pe  —  8  ; 
a  mod  8  =  3, 5,  p  =  2,  e  >  3  ; 

a  mod  p  0  ,  a^p~l^q  mod  p  ^  1  ,  p  >  2  ,  e  =  1  , 

q  every  prime  factor  of  p  —  1  ; 

a  mod  p  ^  0  ,  ap~x  mod  p2  ^  1  ,  aSp~^^q  mod  p  ^  1  , 

p  >  2  ,  e  >  1  ,  q  every  prime  factor  of  p  —  1  . 

(4.4.3) 


For  large  values  of  p  the  primitive  elements  must  be  determined  with  com¬ 
puter  programs  with  the  aid  of  this  theorem. 

Theorem  on  the  maximum  period  of  an  MLCG:  The  max¬ 
imum  period  of  an  MLCG  defined  by  the  quantities  m,  a,  c  =  0, 
vo  is  equal  to  the  order  A  (m).  This  is  attained  if  the  multiplier  a 
is  a  primitive  element  modulo  m  and  when  the  initial  value  xq 
and  the  multiplier  m  have  no  common  factors  (except  unity). 

In  fact,  MLC  generators  with  c  =  0  are  frequently  used  in  practice.  There 
are  two  cases  of  practical  significance  in  choosing  the  multiplier  m. 


4.5  Quality  of  an  MLCG:  Spectral  Test 
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(i)  m  =  2e:  Here  m  —  1  can  be  the  largest  integer  that  can  be  represented 
on  the  computer.  According  to  (4.4.2)  the  maximum  attainable  period 
length  is  m/4. 

(ii)  m  =  p:  If  m  is  a  prime  number,  the  period  of  m  —  1  can  be  attained 
according  to  (4.4.2). 

4.5  Quality  of  an  MLCG:  Spectral  Test 

When  producing  random  numbers,  the  main  goal  is  naturally  not  just  to  attain 
the  longest  possible  period.  This  could  be  achieved  very  simply  with  the 
sequence  0,  1,  2,  . . .,  m  —  1,  0,  1, _ Much  more  importantly,  the  individ¬ 

ual  elements  within  a  period  should  follow  each  other  “randomly”.  First  the 
modulus  m  is  chosen,  and  then  one  chooses  various  multipliers  a  correspond¬ 
ing  to  (4.4.3)  that  guaranty  a  maximum  period.  One  then  constructs  gener¬ 
ators  with  the  constants  a,  m,  and  c  =  0  in  the  form  of  computer  programs 
and  checks  with  statistical  tests  the  randomness  of  the  resulting  numbers. 
General  tests,  also  applicable  to  this  particular  question,  will  be  discussed 
in  Sect.  8.  The  spectral  test  was  especially  developed  for  investigating  ran¬ 
dom  numbers,  in  particular  for  detecting  non-random  dependencies  between 
neighboring  elements  in  a  sequence. 

In  a  simple  example  we  first  consider  the  case  a  =  3,  m  =  7,  c  =  0,  xo  =  1 
and  obtain  the  sequence 

1,3,  2,  6,  4,  5,  1,  ...  . 

We  now  form  pairs  of  neighboring  numbers 

(Xj,xj+ 1)  ,  j  =  0,  l,...,n—  1  .  (4.5.1) 

Here  n  is  the  period,  which  in  our  example  is  n  —  m  —  1  =  6.  In  Fig.  4. 1  the 
number  pairs  (4.5.1)  are  represented  as  points  in  a  two-dimensional  Cartesian 
coordinate  system.  We  note  -  possibly  with  surprise  -  that  they  form  a  reg¬ 
ular  lattice.  The  surprise  is  somewhat  less,  however,  when  we  consider  two 
features  of  the  algorithm  (4.3.2): 

(i)  All  coordinate  values  xj  are  integers.  In  the  accessible  range  of  values 
1  <  Xj  <  n  there  are,  however,  only  n2  number  pairs  (4.5.1)  for  which 
both  elements  are  integer.  They  lie  on  a  lattice  of  horizontal  and  vertical 
lines.  Two  neighboring  lines  have  a  separation  of  one. 

(ii)  There  are  only  n  different  pairs  (4.5.1),  so  that  only  a  fraction  of  the  n2 
points  mentioned  in  (i)  are  actually  occupied. 

We  now  go  from  integer  numbers  xj  to  transformed  numbers 

uj  —  xj jm 


(4.5.2) 
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with  the  property 


0  <  uj  <  1 


(4.5.3) 


For  simplicity  we  assume  that  the  sequence  xq,x\,...  has  the  maximum 
possible  period  m  for  an  MLC  generator.  The  pairs 


m  = 


3 


xj+i 

A 


t> 


Fig.  4.1:  Diagram  of  number  pairs  (4.5.1)  fora  =  3,  m  =  7. 


(uj,Uj+ 1)  ,  j  =  0,  l,...,m  —  1  ,  (4.5.4) 

lie  in  a  square  whose  side  has  unit  length.  Because  the  xj  are  integers, 
the  spacing  between  the  horizontal  or  vertical  lattice  lines  on  which  the 
points  (4.5.4)  must  lie  is  1/m.  By  far  not  all  of  these  points,  however, 
are  occupied.  A  finite  family  of  lines  can  be  constructed  which  pass  through 
those  points  that  are  actually  occupied.  We  consider  now  the  spacing  of  neigh¬ 
boring  lines  within  a  family,  look  for  the  family  for  which  this  distance  is  a 
maximum,  and  call  this  d2- 

If  the  distances  between  neighboring  lattice  lines  for  all  families  are 
approximately  equal,  we  can  then  be  certain  of  having  a  maximally  uni¬ 
form  distribution  of  the  occupied  lattice  points  on  the  unit  square.  Since  this 
distance  is  1/m  for  a  completely  occupied  lattice  (m2  points),  we  obtain  for 
a  uniformly  occupied  lattice  with  m  points  a  distance  of  di  ^  m-1/2.  With  a 
very  nonuniform  lattice  one  obtains  the  considerably  larger  value  di  m-1/2. 


4.5  Quality  of  an  MLCG:  Spectral  Test 
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If  one  now  considers  not  only  pairs  (4.5.4),  but  f -tuples  of  numbers 

(u  j  ,  j - )-i  ?  •  •  •  i  tr  j -\-t — l)  ’  (4.5.5) 

one  sees  that  the  corresponding  points  lie  on  families  of  ( t  —  ^-dimen¬ 
sional  hyperplanes  in  a  /  -dimensional  cube  whose  side  has  unit  length.  Let 
us  investigate  as  before  the  distance  between  neighboring  hyperplanes  of  a 
family.  We  determine  the  family  with  the  largest  spacing  and  designate  this 
by  dt.  One  expects  for  a  uniform  distribution  of  points  (4.5.5)  a  distance 

dt^m~1^  .  (4.5.6) 

If  the  lattice  is  nonuniform,  however,  we  expect 

dt  »  m~l/t  .  (4.5.7) 

The  situations  (4.5.6)  and  (4.5.7)  are  shown  in  Fig.  4.2.  Naturally  one  tries 
to  achieve  as  uniform  a  lattice  as  possible.  One  should  note  that  there  is  at 
least  a  distance  (4.5.6)  between  the  lattice  points.  The  lowest  decimal  places 
of  random  numbers  are  therefore  not  random,  but  rather  reflect  the  structure 
of  the  lattice. 

Theoretical  considerations  give  an  upper  limit  on  the  smallest  possible 
lattice  spacing, 


Fig.  4.2:  Diagram  of  number  pairs  (4.5.4)  for  various  small  values  of  a  and  m. 
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Table  4.1:  Suitable  moduli  m  and  multipliers  a  for  portable  MLC  generators  for  computers 
with  32-bit  (16-bit)  integer  arithmetic. 


32  bit 

16  bit 

m 

a 

m 

a 

2147483  647 

39  373 

32749 

162 

2147483  563 

40014 

32  363 

157 

2147483  399 

40692 

32143 

160 

2147482  811 

41546 

32119 

172 

2147482  801 

42024 

31727 

146 

2147482739 

45  742 

31657 

142 

d,  >  d*  =  ctm~llt  .  (4.5.8) 

The  constants  c,  are  of  order  unity.  They  have  the  numerical  values  [2] 

c2  =  (4/3)-1/4  ,  c3  =  2-1/6  ,  c4  =  2_1/4  ,  cs  —  2-3/10  , 
c6  =  (64/3)  1  / 12  ,  C7  =  2-3/7  ,  c8  =  2-1/2  . 

(4.5.9) 

The  spectral  test  can  now  be  carried  out  as  follows.  For  given  values 
(, m,a )  of  the  modulus  and  multiplier  of  an  MLCG  one  determines  with  a 
computer  algorithm  [2]  the  values  dt(m,  a)  for  small  t,  e.g.,  t  —  2,3, ...  ,6. 
One  constructs  the  test  quantities 

St(m,a)=  -  (4.5.10) 

and  accepts  the  generator  as  usable  if  the  St(m,a )  do  not  exceed  a  given 
limit.  Table  4.1  gives  the  results  of  extensive  investigations  by  L’Ecuyer 
[3],  The  moduli  m  are  prime  numbers  close  to  the  maximum  integer  values 
representable  by  16  or  32  bit.  The  multipliers  are  primitive  elements  modulo 
m.  They  fulfill  the  requirement  a  <  (see  Sect.  4.6).  The  prime  numbers 
were  chosen  such  that  a  does  not  have  to  be  much  smaller  than  Jm,  but  the 
condition  (m,a)  in  Table  4.1  St{m,a)  >  0.65,  t  =  2,  3, . . . ,  6,  still  applies. 

4.6  Implementation  and  Portability  of  an  MLCG 

By  implementation  of  an  algorithm  one  means  its  realization  as  a  computer 
program  for  a  specific  type  of  computer.  If  the  program  can  be  easily  trans¬ 
ferred  to  other  computer  types  and  gives  there  (essentially)  the  same  results, 
then  the  program  is  said  to  be  portable.  In  this  section  we  will  give  a  portable 
implementation  of  an  MLCG,  as  realized  by  Wichmann  and  Hill  [4]  and 
L’Ecuyer  [3]. 


4.6  Implementation  and  Portability  of  an  MLCG 


51 


A  program  that  implements  the  rule  (4.4.1)  is  certain  to  be  portable  if  the 
computations  are  carried  out  exclusively  with  integers.  If  the  computer  has 
k  bits  for  the  representation  of  an  integer,  then  all  numbers  between  —  m  —  1 
and  m  for  m  <  2k~l  are  available. 

We  now  choose  a  multiplier  a  with 

a2  <m  (4.6.1) 

and  define 

q  =  m  div  a  ,  r  =  m  mod  a  ,  (4.6.2) 

so  that 

m—aq  +  r  .  (4.6.3) 

The  expression  m  div  a  defined  by  (4.6.2)  and  (4.6.3)  is  the  integer  part  of 
the  quotient  m/a.  We  now  compute  the  right-hand  side  of  (4.4.1),  where  we 
leave  off  the  index  j  and  note  that  [(xdiv  q  )m\  mod  m  =  0,  since  x  div  q  is 
an  integer: 


[ax]  mod  m 


[ax  —  (x  div  q)m ]  mod  m 

[ax  —  (x  div  q)(aq  +  r)]  mod  m 

[a{x  —  (x  div  q)q}  —  (x  div  q)r ]  mod  m 

[a(x  mod  q)  —  (x  div  q)r]  mod  m  .  (4.6.4) 


Since  one  always  has  0  <  x  <  m,  it  follows  that 


a(x  mod  q)  <  aq  <  m  ,  (4.6.5) 

r\ 

(x  div  q)r  <  [(aq  +  r)  div  q]r  —  ar  <  a  <  m  .  (4.6.6) 

In  this  way  both  terms  in  square  brackets  in  the  last  line  of  (4.6.4)  are  less 
than  m,  so  that  the  bracketed  expression  remains  in  the  interval  between 
— m  and  m. 

In  the  Java  class  DatanRandom  we  have  implemented  the  expres¬ 
sion  (4.6.4)  in  the  following  three  lines,  in  which  all  variables  are  integer: 


k  =  x  /  Q; 

x  =  A  *  (x  -  k  *  Q)  -  k  *  R; 
if(x  <  0)  x  =  x  +  M; 


One  should  note  that  division  of  two  integer  variables  results  directly  in  the 
integer  part  of  the  quotient.  The  first  line  therefore  yields  x  div  q  and  the  last 
line  ax  mod  m ,  respectively. 

The  method  DatanRandom.mlcg  yields  a  partial  sequence  of  ran¬ 
dom  numbers  of  length  N .  Each  time  the  subroutine  is  called,  an  addi¬ 
tional  partial  sequence  is  produced.  The  period  of  the  entire  sequence  is 
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m  —  1  =  2147483562.  The  computation  is  carried  out  entirely  with  integer 
arithmetic,  ensuring  portability.  The  output  values  are,  however,  floating  point 
valued  because  of  the  division  by  m,  and  therefore  correspond  to  a  uniform 
distribution  between  0  and  1 . 

Often  one  would  like  to  interrupt  a  computation  requiring  many  random 
numbers  and  continue  it  later  starting  from  the  same  place.  In  this  case  one  can 
read  out  and  store  the  last  computed  (integer)  random  number  directly  before 
the  interruption,  and  use  it  later  for  producing  the  next  random  number.  In  the 
technical  terminology  one  calls  such  a  number  the  seed  of  the  generator. 

It  is  sometimes  desirable  to  be  able  to  produce  non-overlapping  partial 
sequences  of  random  numbers  not  one  after  the  other  but  rather  independently. 
In  this  way  one  can,  for  example,  carry  out  parts  of  larger  simulation  problems 
simultaneously  on  several  computers.  As  seeds  for  such  partial  sequences  one 
uses  elements  of  the  total  sequence  separated  by  an  amount  greater  than  the 
length  of  each  partial  sequence.  Such  seeds  can  be  determined  without  having 
to  run  through  the  entire  sequence.  From  (4.4.1)  it  follows  that 

xj+n  =  ( anXj )  mod  m  =  [( an  mod  m)xj]  mod  m  .  (4.6.7) 

L’Ecuyer  [3]  suggests  setting  n  =  2d  and  choosing  some  seed  a'o  .  The  ex- 

r\ d 

pression  a  mod  m  can  be  computed  by  beginning  with  a  and  squaring  it  d 
times  modulo  m.  Then  one  computes  x„  using  (4.6.7)  and  obtains  correspond¬ 
ingly 

%2n  5  %3 n  ?  •  •  •  • 


4.7  Combination  of  Several  MLCGs 

Since  the  period  of  an  MLCG  is  at  most  m  —  1,  and  since  m  is  restricted  to 
m  <  2k~l  —  1  where  k  is  the  number  of  bits  available  in  the  computer  for  the 
representation  of  an  integer,  only  a  relatively  short  period  can  be  attained  with 
a  single  MLCG.  Wichmann  and  Hill  [4]  and  L’Ecuyer  [3]  have  given  a 
procedure  for  combining  several  MLCGs,  which  allows  for  very  long  periods. 
The  technique  is  based  on  the  following  two  theorems. 

Theorem  on  the  sum  of  discrete  random  variables,  one  of 
which  comes  from  a  discrete  uniform  distribution:  If  Xi , . . . ,  Xg 

are  independent  random  variables  that  can  only  assume  integer 
values,  and  if  Xi  follows  a  discrete  uniform  distribution,  so  that 

1 

P(X\=n)  =  —  ,  n  =  0, 1, . . . ,  d  —  1  , 

d 
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then 


i 


x  = 


E*; 


mod  d 


also  follows  this  distribution. 


(4.7.1) 


We  first  demonstrate  the  proof  for  l  =  2,  using  the  abbreviations  min 
(X2)  =  a,  max(X2)  =  b.  One  has 


P(x  —  n) 


CO 

^  '  P  (Xi  4-  X2  —  n  4"  kd ) 

k=0 

b 

YJP(*2  =  i)P(X\=(n 


1 

d 


b 


J^P(x2  =  i) 


1 

d 


i )  mod  d) 


For  l  —  3  we  first  construct  the  variable  x',  =  Xi  +X2,  which  follows  a  discrete 
uniform  distribution  between  0  and  d  —  1 ,  and  then  the  sum  x!x  +  X3 ,  which  has 
only  two  terms  and  therefore  possesses  the  same  property.  The  generalization 
for  i  >  3  is  obvious. 


Theorem  on  the  period  of  a  family  of  generators:  Con¬ 
sider  the  random  variables  Xjj  coming  from  a  generator  j  with 
a  period  pj ,  so  that  the  generator  gives  a  sequence  Xj  0,  x;  1 , 

Xj,Pj_r  We  consider  now  t  generators  j  =  1.2, _ (!  and  the 

sequence  of  f-tuples 


{xu  -  ^2 ,i  1  •  •  •  r  x } 


(4.7.2) 


Its  period  p  is  the  smallest  common  multiple  of  the  periods 
pi ,  P2, . . . ,  pi  of  the  individual  generators.  The  proof  is  obtained 
directly  from  the  fact  that  p  is  clearly  a  multiple  of  each  pj . 

We  now  determine  the  maximum  value  of  the  period  p.  If  the  i  individual 
MLCGs  have  prime  numbers  m,j  as  moduli,  then  their  periods  are  p  j  =  m  j  —  1 
and  are  therefore  even.  Therefore  one  has 


(4.7.3) 


Equality  results  if  the  quantities  ( nij  —  l)/2  possess  no  common  factors. 

The  first  theorem  of  this  section  can  now  be  used  to  construct  a  sequence 
with  period  given  by  (4.7.3).  One  forms  first  the  integer  quantity 
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mod  (mi  —  1) 


(4.7.4) 


The  alternating  sign  in  (4.7.4),  which  simplifies  the  construction  of  the 
modulus  function,  does  not  contradict  the  prescription  of  (4.7.1),  since  one 

could  also  use  in  place  of  X2,X4, . . .,  the  variables  x'2  =  —  X2,  x'A  =  —X4, _ 

The  quantity  zi  can  take  on  the  values 


Zi  e  {0, 1,  ...,mi  -2} 


(4.7.5) 


The  transformation  to  floating  point  numbers 


Zi/m  1,  zi  >  0 

(mi-l)/mi,  Zi  =  0 


(4.7.6) 


gives  values  in  the  range  0  <  <  1. 

In  the  method  DatanRandom.ecuy  we  use  the  techniques,  assem¬ 
bled  above,  to  produce  uniformly  distributed  random  numbers  with  a  long 
period.  We  combine  two  MLCGs  with  mi  =  2147483563,  a\  =  40014, 
m2  =  2147483399,  a2  =  40692.  The  numbers  (mi  —  l)/2  and  (m2  —  l)/2 
have  no  common  factor.  Therefore  the  period  of  the  combined  generator  is, 
according  to  (4.7.3), 

p  =  (mi  —  l)(m2  —  l)/2  ^  2.3  •  1018  . 


The  absolute  values  of  all  integers  occurring  during  the  computation  remain 
in  the  range  <231  —  85.  The  resulting  floating  point  values  u  are  in  the  range 
0  <  u  <  1 .  One  does  not  obtain  the  values  0  or  1 ,  at  least  if  23  or  more  bits  are 
available  for  the  mantissa,  which  is  almost  always  the  case  when  represent¬ 
ing  floating  point  numbers  with  32  bit.  The  program  with  the  given  values  of 
mi,  m2,  a\,  ci2  has  been  subjected  to  the  spectral  test  and  to  many  other  tests 
by  L’ECUYER  [3],  who  has  provided  a  PASCAL  version.  He  determined  that 
it  satisfied  all  of  the  requirements  of  the  tests. 

Figure  4.3  illustrates  the  difference  between  the  simple  MLCG  and  the 
combined  generator.  For  the  simple  MLCG  one  can  still  recognize  a  struc¬ 
ture  in  a  scatter  plot  of  the  number  pairs  (4.5.4),  although  with  an  expansion 
of  the  abscissa  by  a  factor  of  1000.  The  corresponding  diagram  for  the  com¬ 
bined  generator  appears,  in  contrast,  to  be  completely  without  structure.  For 
each  diagram  one  million  pairs  of  random  numbers  were  generated.  The  plots 
correspond  only  to  a  narrow  strip  on  the  left-hand  edge  of  the  unit  square. 

In  order  to  initialize  non-overlapping  partial  sequences  one  can  use  two 
methods: 


(i)  One  applies  the  procedure  discussed  in  connection  with  (4.6.7)  to  both 
MLCGs,  naturally  with  the  same  value  n,  in  order  to  construct  pairs  of 
seeds  for  each  partial  sequence. 
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(ii)  It  is  considerably  easier  to  use  the  same  seed  for  the  first  MLCG  for 
every  partial  sequence.  For  the  second  MLCG  one  uses  an  arbitrary 
seed  for  the  first  partial  sequence,  the  following  random  number  from 
the  second  MLCG  for  the  second  partial  sequence,  etc.  In  this  way  one 
obtains  partial  sequences  that  can  reach  a  length  of  (mi  —  1)  without 
overlapping. 


U 


0  .0005  .001 


.0005  .001 


D> 


Fig.  4.3:  Scatter  plots  of  number  pairs  (4.5.4)  from  (a)  a  MLC  generator  and  (b)  a  combined 
generator.  The  methods  DatanRandom.mclg  and  DatanRandom.ecuy,  respectively, 
were  used  in  the  generation. 


4.8  Generation  of  Arbitrarily  Distributed  Random 
Numbers 

4.8.1  Generation  by  Transformation  of  the  Uniform  Distribution 

If  x  is  a  random  variable  following  the  uniform  distribution, 

f(x)  =  1  ,  0<x<l  ;  f(x)  =  0  ,  x<0  ,  x>l  ,  (4.8.1) 

and  y  is  a  random  variable  described  by  the  probability  density  g(y),  the  trans¬ 
formation  (3.7.1)  simplifies  to 


g(y)dy  =  d* 


(4.8.2) 
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We  use  the  distribution  function  G(y ),  which  is  related  to  g(y)  through 
dG(y)/dy  =  g(y),  and  write  (4.8.2)  in  the  form 

dx  =  g(y)dy  =  dG(y)  ,  (4.8.3) 

or  after  integration, 

x  =  G(y)  =  g(t)dt  .  (4.8.4) 

J  —  00 

This  equation  has  the  following  meaning.  If  a  random  number  x  is  taken  from 
a  uniform  distribution  between  0  and  1  and  the  function  x  =  G  (y)  is  inverted, 

y  =  G-!(x)  ,  (4.8.5) 

then  one  obtains  a  random  number  y  described  by  the  probability  density 
g(y).  The  relationship  is  depicted  in  Fig.  4.4a.  The  probability  to  obtain  a 
random  number  x  between  x  and  x  +  dx  is  equal  to  the  probability  to  have  a 
value  y(x)  between  y  and  y  +  dy . 


y 


x  =  G(y) 


yi  y2 
P(y j) 


yn 


-y 


yi  y2  yn 

(b) 


Fig.  4.4  :  Transformation  from  a  uniformly  distributed  variable  x  to  a  variable  y  with  the  dis¬ 
tribution  function  G(y).  The  variable  y  can  be  continuous  (a)  or  discrete  (b). 


The  relationship  (4.8.4)  can  be  also  be  used  to  produce  discrete  probability 
distributions.  An  example  is  shown  in  Fig.  4.4b.  The  random  variable  y  can  take 
on  the  values  yi ,  yi , . . . ,  yn  with  the  probabilities  P  (y  i ) ,  P  (yi) , . . . ,  P  (y„) .  The 
distribution  function  as  given  by  (3.2.1)  is  G(y)  =  P{ y  <  y).  The  construction 
of  a  step  function  x  =  G(y)  according  to  this  equation  gives  the  values 

i 

Xj  =  G(yy)  =  ^P{yk) 

k= t 


(4.8.6) 
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which  lie  in  the  range  between  0  and  1 .  From  this  one  can  produce  random 
numbers  according  to  a  discrete  distribution  G(y )  by  first  producing  random 
numbers  x  uniformly  distributed  between  0  and  1 .  Depending  on  the  interval 
j  in  which  x  falls,  xj- \  <  x  <  xj,  the  number  y j  is  then  produced. 


Example  4.1:  Exponentially  distributed  random  numbers 


We  would  like  to  generate  random  numbers  according  to  the  probability 
density 


t  >0 


0 ,  t  <  0 


(4.8.7) 


This  is  the  probability  density  describing  the  time  t  of  the  decay  of  a  ra¬ 
dioactive  nucleus  that  exists  at  time  t  =  0  and  has  a  mean  lifetime  r.  The 
distribution  function  is 


1  rr 

x  -  G(t)  =  -  /  g(t') dt'  =  l-  e~r/T  .  (4.8.8) 

T  Jt'= o 

According  to  (4.8.4)  and  (4.8.5)  we  can  obtain  exponentially  distributed  ran¬ 
dom  numbers  t  by  first  generating  random  numbers  uniformly  distributed 
between  0  and  1  and  then  finding  the  inverse  function  t  =  G_1  (x),  i.e., 


t=— rln(l-x)  . 

Since  1  —  x  is  also  uniformly  distributed  between  0  and  1 ,  it  is  sufficient  to 
compute 

t  =  —  rlnx  .  ■  (4.8.9) 


Example  4.2:  Generation  of  random  numbers  following  a  Breit-Wigner 
distribution 

To  generate  random  numbers  y  which  follow  a  Breit-Wigner  distribution 
(3.3.32), 


g(y)  = 


2 


r 


n  r  4(y  -  a)2  +  r2  ’ 
we  proceed  as  discussed  in  Sect.  4.8.1.  We  form  the  distribution  function 


x  -  G(y)  =  f  g(y)dy 

J  —  OO 


2  ry  r 2 

TV  r  J_ oo  4 (y  -  a)2 +  r2<iy 


and  perform  the  integration  using  the  substitution 

2  (y-a)  2 

d  u  —  — 

r 


u  = 


r 
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Thus  we  obtain 


1  +  u 


1  rd=2  (y-a)/r  i 

JC  =  G(y)  =  —  / 

H  Jo=—O0 

arctan2(y  —  a)/T  1 
j r  '  2 

By  inversion  we  obtain 


1 


du  =  —  [arctanw]!' 

7T 


2  (y-a)/r 


OO 


2  (y  —  a)/r  =  tan 


7T  I  x  —  - 


r 


y  =  a  H - tan  • 

2  1 


7T  I  X  —  - 


(4.8.10) 


If  x  are  random  numbers  uniformly  distributed  in  the  interval  0  <  x  <  1 ,  then 
y  follows  a  Breit-Wigner  distribution.  ■ 


Example  4.3:  Generation  of  random  numbers  with  a  triangular  distribution 

In  order  to  generate  random  numbers  y  following  a  triangular  distribution  as 
in  Problem  3.2  we  form  the  distribution  function 


F(y)  = 


0, 

(y-a)2 

( b  —  a)(c  —  a)  ’ 

x  ( y-b )2 

(b  —  a)(b  —  c)  ’ 

1, 


y  <  a  , 
a  <  y  <  c 

c  <y  <b 
b  <y  . 


In  particular  we  have 

c  —  a 

F(c )  =  - - 

b  —  a 

Inverting  x  =  F(y)  gives 


y  =  a  +  —  a)(c  —  a)x  ,  x  <  (c  —  a)/(b  —  a)  , 

y  =  b  —  y/Jb  —  «)((>  —  c)(l  —  v)  ,  x  >  {c  —  a) / (b  —  a) 


Ifx  is  uniformly  distributed  with  0  <  x  <  1 ,  then  y  follows  a  triangular  distri¬ 
bution.  ■ 


4.8.2  Generation  with  the  von  Neumann 
Acceptance-Rejection  Technique 

The  elegant  technique  of  the  previous  section  requires  that  the  distribution 
function  x  =  G(y)  be  known  and  that  the  inverse  function  y  =  G-1  (x)  exists 
and  be  known  as  well. 
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Often  one  only  knows  the  probability  density  g(y).  One  can  then  use  the 
VON  Neumann  acceptance-rejection  technique,  which  we  introduce  with  a 
simple  example  before  discussing  it  in  its  general  form. 


Example  4.4:  Semicircle  distribution  with  the  simple  acceptance-rejection 
method 

As  a  simple  example  we  generate  random  numbers  following  a  semicircular 
probability  density, 


g(y)  =  ■ 


(2/ Ti  R2)^/  /?2  —  y2, 

0, 


\y\<R 

\y\>R 


(4.8.11) 


Instead  of  trying  to  find  and  invert  the  distribution  function  G(y),  we  gener¬ 
ate  pairs  of  random  numbers  (y,  ,  U,).  Here  y,  is  uniformly  distributed  in  the 
interval  available  to  y,  —R  <y<  R,  and  U,  is  uniformly  distributed  in  the 
range  of  values  assumed  by  the  function  g(y),  0  <  u  <  R.  For  each  pair  we 
test  if 

Ui>g(Vi)  .  (4.8.12) 

If  this  inequality  is  fulfilled,  we  reject  the  random  number  y(.  The  set  of  ran¬ 
dom  numbers  y,  that  are  not  rejected  then  follow  a  probability  density  g(y), 
since  each  was  accepted  with  a  probability  proportional  to  g  (y, ) .  ■ 

The  technique  of  Example  4.4  can  easily  be  described  geometrically. 
To  generate  random  numbers  in  the  interval  a  <y<b  according  to  the  prob¬ 
ability  density  g(y),  one  must  consider  in  the  region  a  <y<  b  the  curve 

u  =  g(y)  (4.8.13) 

and  a  constant 

u  —  d  ,  d  >  gmax  5  (4.8.14) 

which  is  greater  than  or  equal  to  the  maximum  value  of  g(y)  in  that  region. 
In  the  (y ,  u)  plane  this  constant  is  described  by  the  line  u=d.  Pairs  of  random 
numbers  (y u, )  uniformly  distributed  in  the  interval  a  <  y,  <  b,  0  <  U,  <  d 
correspond  to  a  uniform  distribution  of  points  in  the  corresponding  rectangle 
of  the  (y,  w)-plane.  If  all  of  the  points  for  which  (4.8.12)  holds  are  rejected, 
then  only  points  under  the  curve  u  =  g(y)  remain.  Figure  4.5  shows  this  situ¬ 
ation  for  the  Example  4.4.  [It  is  clear  that  the  technique  also  gives  meaningful 
results  if  the  function  is  not  normalized  to  one.  In  Fig.  4.5  we  have  simply  set 
g(y)  =  Vf?2-y2  and  R  =  1.] 

For  the  transformation  technique  of  Sect.  4.8.1,  each  random  number  y(- 
required  only  that  exactly  one  random  number  x,  be  generated  from  a  uniform 
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distribution  and  that  it  be  transformed  according  to  (4.8.5).  In  the  acceptance- 
rejection  technique,  pairs  y, ,  u,  must  always  be  generated,  and  a  consider¬ 
able  fraction  of  the  numbers  y,  -  depending  on  the  value  of  w,  according 
to  (4.8.12)  -  are  rejected.  The  probability  for  y ,•  to  be  accepted  is 


faS(y)dy 
( b  —  a)d 


(4.8.15) 


u 


Fig.  4.5  :  All  the  pairs  (y [ ,  u /)  produced  are  marked  as  points  in  the  ( y ,  w)-plane.  Points  above 
the  curve  u  =  g(y)  (small  points )  are  rejected. 

We  can  call  E  the  efficiency  of  the  procedure.  If  the  interval  a  <y<  b  includes 
the  entire  allowed  range  of  y,  then  the  numerator  of  (4.8. 15)  is  equal  to  unity, 
and  one  obtains 

1 

E  = -  .  (4.8.16) 

(. b  —  a)d 

The  numerator  and  denominator  of  (4.8.15)  are  simply  the  areas  con¬ 
tained  in  the  region  a  <}’<b  under  the  curves  (4.8.13)  and  (4.8.14),  respec¬ 
tively.  One  distributes  points  (y,-,  u,)  uniformly  under  the  curve  (4.8.14)  and 
rejects  the  random  numbers  y,  if  the  inequality  (4.8.12)  holds.  The  efficiency 
of  the  procedure  is  certainly  higher  if  one  uses  as  the  upper  curve  not  the 
constant  (4.8.14)  but  rather  a  curve  that  is  closer  to  g(y). 

With  this  in  mind  the  acceptance-rejection  technique  can  be  stated  in  its 
general  form: 

(i)  One  finds  a  probability  density  s(y)  that  is  sufficiently  simple  that 
random  numbers  can  be  generated  from  it  using  the  transformation 
method,  and  a  constant  c  such  that 
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g(y)<c-s(y)  ,  a  <  y  <  b  ,  (4.8.17) 


holds. 


(ii)  One  generates  one  random  number  y  uniformly  distributed  in  the  inter¬ 
val  a  <y  <b  and  a  second  random  number  u  uniformly  distributed  in 
the  interval  0  <  U  <  1 . 


(iii)  One  rejects  y 


u  > 


g(y) 

c-s( y) 


(4.8.18) 


After  the  points  (ii)  and  (iii)  have  been  repeated  enough  times,  the  resulting 
set  of  accepted  random  numbers  y  follows  the  probability  density  g(y),  since 


Cy  sit) 

P(y  <y)  =  /  S(t)^-dt  =  - 

J  a 


c  ■  s(t) 


-  f  g(t)dt  =  -[G(y)-G(a)] 
C  Ja  C 


If  the  interval  a  <y<b  includes  the  entire  range  of  y  for  both  g(y)  as  well 
as  for  s(y),  then  one  obtains  an  efficiency 


1 

E  —  - 
c 


(4.8.19) 


Example  4.5:  Semicircle  distribution  with  the  general  acceptance-rejection 
method 

One  chooses  for  c  ■  s(y)  the  polygon 


c-s(y) 


0,  y  <  —R  , 

3R/2  +  y,  -R<y<-R/2  , 

R,  -R/2<y<R/2  , 

3R/2  —  y ,  R/2  <y<R  , 

0,  R  <y  • 


The  efficiency  is  clearly 


in  comparison  to 


71 R2  1  2  71 

'  2 R2  -  R2/ 4  ~  T 

71 R2  1  71 

E  = 


as  in  Example  4.4.  ■ 


2  2  R2  4 
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4.9  Generation  of  Normally  Distributed  Random  Numbers 

By  far  the  most  important  distribution  for  data  analysis  is  the  normal  distribu¬ 
tion,  which  we  will  discuss  in  Sect.  5.7.  We  present  here  a  program  that  can 
produce  random  numbers  x,  following  the  standard  normal  distribution  with 
the  probability  density 


f(x)  =  - =c-x2/2  ■  (4.9.1) 

\j7jt 

The  corresponding  distribution  function  F(x )  can  only  be  computed  and  in¬ 
verted  numerically  (Appendix  C).  Therefore  the  simple  transformation  method 
of  Sect.  4.8.1  cannot  be  used.  The  polar  method  by  Box  and  Muller  [5] 
described  here  combines  in  an  elegant  way  acceptance-rejection  with  trans¬ 
formation.  The  algorithm  consists  of  the  following  steps: 

(i)  Generate  two  independent  random  numbers  Ui ,  U2  from  a  uniform  dis¬ 
tribution  between  0  and  1.  Transform  Vi  =  2ui  —  1,  V2  =  2U2  —  1. 

(ii)  Compute  S  =  v2  +  v\. 

(iii)  Ifs  >  1,  return  to  step  (i). 

(iv)  Xi  =  Vi>/ — (2/s) Ins  and  X2  =  (2/S) Ins  are  two  independent 

random  numbers  following  the  standard  normal  distribution. 

The  number  pairs  (Vi,  V2)  obtained  from  step  (i)  are  the  Cartesian  coor¬ 
dinates  of  a  set  of  points  uniformly  distributed  inside  the  unit  circle.  We  can 
write  them  as  Vi  =  rcos0,  V2  =  rsin0  using  the  polar  coordinates  r  =  ^/s,  0  — 
arctan(V2/Vi).  The  point  (Xi,  X2)  then  has  the  Cartesian  coordinates 

Xi  —  cos  6  y/ —21ns  ,  X2  =  sin0  V —21ns  . 

We  now  ask  for  the  probability 

Fir)  =  P(V-21nS  <  r)  =  P(-21nS  <  r2) 

=  P(  s>e_r2/2)  . 


Since  S  =  r2  is  by  construction  uniformly  distributed  between  0  and  1,  one 

has  ,  , 

F{r)  =  Pi S  >  e“r  /2)  =  1  - e“r  /2  . 

The  probability  density  of  r  is 


d  Fir) 


dr 


=  re 


4.10  Random  Numbers  According  to  a  Multivariate  Normal  Distribution 
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The  joint  distribution  function  of  Xi  and  X2, 


F(jci,*2)  =  F(xi  <  *1,  x2  <  x2)  =  P(rcos6»  <  jci,  rsin6>  <  x2) 


iff 

iff 


re 


-r2/ 2 


dr  dtp 


(Xl<Xl,X2<X2) 


Q-(*hxb/2dx  dy 


(Xi  <jci  ,x2<jc2) 


(— L=  f  1  e“xi/2dvi)  (— L=  f  2  c~x2/2dx2 

\V2n  J-00  )  \V2jt  J-00 


is  the  product  of  two  distribution  functions  of  the  standard  normal  distribution. 
The  procedure  is  implemented  in  the  method  DatanRandom.  Standard-Normal 
and  illustrated  in  Fig.  4.6. 


Fig.  4.6  :  Illustration  of  the  Box-Muller  procedure,  (a)  Number  pairs  ( v  1 ,  v2)  are  gener¬ 
ated  that  uniformly  populate  the  square.  Those  pairs  are  then  rejected  that  do  not  lie  in¬ 
side  the  unit  circle  (marked  by  small  points),  (b)  This  is  followed  by  the  transformation 
(Vi,v2)  -*  (Xi,X2). 


Many  other  procedures  are  described  in  the  literature  for  the  generation 
of  normally  distributed  random  numbers.  They  are  to  a  certain  extent  more 
efficient,  but  are  generally  more  difficult  to  program  than  the  Box-Muller 
procedure. 

4.10  Generation  of  Random  Numbers  According 
to  a  Multivariate  Normal  Distribution 

The  probability  density  of  a  multivariate  normal  distribution  of  n  variables 
x  =  (xi,X2, , x„ )  is  according  to  (5.10.1) 
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0  (x)  =  k  exp  • 


a)T5(x 


Here  a  is  the  vector  of  expectation  values  and  B  =  C  1  is  the  inverse  of  the 
positive-definite  symmetric  covariance  matrix.  With  the  Cholesky  decompo¬ 
sition  B  —  D  l  D  and  the  substitution  u  =  D(x  —  a)  the  exponent  takes  on  the 
simple  form 

— -UT U  =  —  ~(u\  +  U2 - lu2n)  . 

Thus  the  elements  w,  of  the  vectors  u  follow  independent  standard  normal 
distributions  [cf.  (5.10.9)].  One  obtains  vectors  x  of  random  numbers  by 
first  forming  a  vector  u  of  elements  u,-  which  follow  the  standard  normal 
distribution  and  then  performing  the  transformation 

x  =  D~1u  +  a  . 


This  procedure  is  implemented  in  the  method  DatanRandom.multivariate 
Normal. 


4.11  The  Monte  Carlo  Method  for  Integration 


It  follows  directly  from  its  construction  that  the  acceptance-rejection  tech¬ 
nique,  Sect.  4.8.2,  provides  a  very  simple  method  for  numerical  integration. 
If  N  pairs  of  random  numbers  (y  i ,  u;),  i  —  1, 2, . . . ,  N  are  generated  accord¬ 
ing  to  the  prescription  of  the  general  acceptance-rejection  technique,  and  if 
N  —  n  of  them  are  rejected  because  they  fulfill  condition  (4.8.18),  then  the 
numbers  N  (or  n)  are  proportional  to  the  areas  under  the  curves  c  ■  s(y)  (or 
g(y)),  at  least  in  the  limit  of  large  N,  i.e., 


/j’g(y)dy 

cfaS(y)dy 


n 

lim  — 

N^oo  N 


(4.11.1) 


Since  the  function  s{y)  is  chosen  to  be  particularly  simple  [in  the  simplest 
case  one  has  s  (y)  =  1  /(b  —  a)\,  the  ratio  n / N  is  a  direct  measure  of  the  value 
of  the  integral 


1=  f  g(y)dy  = 


(4.11.2) 


Here  the  integrand  g(y )  does  not  necessarily  have  to  be  normalized,  i.e.,  one 
does  not  need  to  require 

/OO 

g(y)dy  -  1 

-OO 

as  long  as  c  is  chosen  such  that  (4.8.17)  is  fulfilled. 


4.11  The  Monte  Carlo  Method  for  Integration 
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Example  4.6:  Computation  of  n 

Referring  to  Example  4.4  we  compute  the  integral  using  (4.8.1 1)  with  R  =  1: 

1=  f  gOOdy  =  7t/4  . 

Jo 

Choosing  s(y)  =  1  and  c  =  1  we  obtain 

n 

I  —  lim  — 

co  N 


We  expect  that  when  N  points  are  distributed  according  to  a  uniform 
distribution  in  the  square  0<y<l,0<w<l,  and  when  n  of  them  lie  inside 
the  unit  circle,  then  the  ratio  n/N  approaches  the  value  I  =  n/A  in  the  limit 
N  — ►  oo.  Table  4.2  shows  the  results  for  various  values  of  n  and  for  various 
sequences  of  random  numbers.  The  exact  value  of  n/N  clearly  depends  on 
the  particular  sequence.  In  Sect.  6.8  we  will  determine  that  the  typical  fluctu¬ 
ations  of  the  number  n  are  approximately  An  =  sfn.  Therefore  one  has  for 
the  relative  precision  for  the  determination  of  the  integral  (4.1 1.2) 


AI  _  An  _  1 
I  n  n 


(4.11.3) 


We  expect  therefore  in  the  columns  of  Table  4.2  to  find  the  value  of  n  with 
precisions  of  10,  1,  and  0. 1  %.  We  find  in  fact  in  the  three  columns  fluctuations 
in  the  first,  second,  and  third  places  after  the  decimal  point.  ■ 


Table4.2:  Numerical  values  of  4n/N  for  various  values  of  n.  The  entries  in  the  columns 
correspond  to  various  sequences  of  random  numbers. 


An/N 

n  =  102 

n  =  104 

3 

II 

h— ^ 

o 

Os 

3.419 

3.122 

3.141 

3.150 

3.145 

3.143 

3.279 

3.159 

3.144 

3.419 

3.130 

3.143 

The  Monte  Carlo  method  of  integration  can  now  be  implemented  by  a 
very  simple  program.  For  integration  of  single  variable  functions  it  is  usu¬ 
ally  better  to  use  other  numerical  techniques  for  reasons  of  computing  time. 
For  integrals  with  many  variables,  however,  the  Monte  Carlo  method  is  more 
straightforward  and  often  faster  as  well. 
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4.12  The  Monte  Carlo  Method  for  Simulation 

Many  real  situations  that  are  determined  by  statistical  processes  can  be 
simulated  in  a  computer  with  the  aid  of  random  numbers.  Examples  are 
automobile  traffic  in  a  given  system  of  streets  or  the  behavior  of  neutrons 
in  a  nuclear  reactor.  The  Monte  Carlo  method  was  originally  developed  for 
the  latter  problem  by  VON  Neumann  and  Ulam.  A  change  of  the  parame¬ 
ters  of  the  distributions  corresponds  then  to  a  change  in  the  actual  situation. 
In  this  way  the  effect  of  additional  streets  or  changes  in  the  reactor  can  be 
investigated  without  having  to  undertake  costly  and  time  consuming  changes 
in  the  real  system.  Not  only  processes  of  interest  following  statistical  laws  can 
be  simulated  with  the  Monte  Carlo  method,  but  also  the  measurement  errors 
which  occur  in  every  measurement. 

Example  4.7:  Simulation  of  measurement  errors  of  points  on  a  line 
We  consider  a  line  in  the  ( t ,  y)-plane.  It  is  described  by  the  equation 

y  —  at  +  b  .  (4.12.1) 

If  we  choose  discrete  values  of  t 

to  ,  t\  —  to  T-  At  ,  t2  —  to  T-  2  At  ,  ...  ,  (4. 12.2) 

then  they  correspond  to  values  of  y 

yi—ati-\-b  ,  i  =  0, 1, . . . ,  n  —  1  .  (4.12.3) 

We  assume  that  the  values  to,  t\,  ...  of  the  “controlled  variable”  t  can  be  set 
without  error.  Because  of  measurement  errors,  however,  instead  of  the  values 
V/ ,  one  obtains  different  values 

y'i  =  yi  +  ei  .  (4.12.4) 

Here  e,  are  the  measurement  errors,  which  follow  a  normal  distribution  with 
mean  of  zero  and  standard  deviation  ay  (cf.  Sect.  5.7).  The  method  Datan- 
Random.line  generates  number  pairs  (t/,  y').  Figure  4.7  as  an  example 
displays  10  simulated  points.  ■ 

Example  4.8:  Generation  of  decay  times  for  a  mixture  of  two  different 
radioactive  substances 

At  time  t  =  0  a  source  consists  of  N  radioactive  nuclei  of  which  aN  decay 
with  a  lifetime  t\  and  ( a  —  1  )N  with  a  mean  lifetime  t2,  with  0  <  a  <  1. 
Random  numbers  for  two  different  problems  must  be  used  in  the  simulation 
the  decay  times  occurring:  for  the  choice  of  the  type  of  nucleus  and  for  the 
determination  of  the  decay  time  of  the  nucleus  chosen,  cf.  (4.8.9).  The  method 
DatanRandom.radio  implements  this  example.  ■ 


4.13  Java  Classes  and  Example  Programs 
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y 


Fig.  4.7  :  Line  in  the  (t,  y) -plane  and  simulated  measured  values  with  errors  in  y. 


4.13  Java  Classes  and  Example  Programs 

Java  Class  for  the  Generation  of  Random  Numbers 

DatanRandom  contains  methods  for  the  generation  of  random  numbers 
following  various  distributions,  in  particular  DatanRandom.ecuy 
for  the  uniform,  DatanRandom.  Standard-Normal  for  the  stan¬ 
dard  normal,  and  DatanRandom.multivariateNormal  for  the 
multivariate  normal  Distribution.  Further  methods  are  used  to  illustrate 
a  simple  MLC  generator  or  to  demonstrate  the  following  examples. 

Example  Program  4.1:  The  class  ElRandom  demonstrates  the  generation 
of  random  numbers 

One  can  choose  interactively  between  three  generators.  After  clicking  on  Go  100 
random  numbers  are  generated  and  displayed.  The  seeds  before  and  after  generation 
are  shown  and  can  be  changed  interactively. 

Example  Program  4.2:  The  class  E2Random  demonstrates  the  generation 
of  measurement  points,  scattering  about  a  straight  line 

Example  4.7  is  realized.  Parameter  input  is  interactive,  output  both  numerical  and 
graphical. 
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Example  Program  4.3:  The  class  E3Random  demonstrates  the  simulation 
of  decay  times 

Example  4.8  is  realized.  Parameter  input  is  interactive,  output  in  form  of  a  histogram. 

Example  Program  4.4:  The  class  E4Random  demonstrates  the  generation 
of  random  numbers  from  a  multivariate  normal  distribution 

The  procedure  of  Sect.  4.10  is  realized  for  the  case  of  two  variables.  Parameter  input 
is  interactive.  The  generated  number  pairs  are  displayed  numerically. 


5.  Some  Important  Distributions  and  Theorems 


We  shall  now  discuss  in  detail  some  specific  distributions.  This  chapter 
could  therefore  be  regarded  as  a  collection  of  examples.  These  distributions, 
however,  are  of  great  practical  importance  and  are  often  encountered  in  many 
applications.  Moreover,  their  study  will  lead  us  to  a  number  of  important 
theorems. 

5.1  The  Binomial  and  Multinomial  Distributions 

Consider  an  experiment  having  only  two  possible  outcomes.  The  sample 
space  can  therefore  be  expressed  as 

E  =  A  +  A  (5.1.1) 


with  the  probabilities 

P(A)  =  p  ,  P(A)  =  l-  p  =  q  .  (5.1.2) 

One  now  performs  n  independent  trials  of  the  experiment  defined  by  (5.1.1). 
One  wishes  to  find  the  probability  distribution  for  the  quantity  x  =  Yf=  \  xo 
where  one  has  x,  =  1  (or  0)  when  A  (or  A)  occurs  as  the  result  of  the  / th 
experiment. 

The  probability  that  the  first  k  trials  result  in  A  and  all  of  the  rest  in  A  is, 
using  Eq.  (2.3.8), 


S.  Brandt,  Data  Analysis:  Statistical  and  Computational  Methods  for  Scientists  and  Engineers , 
DOI  10.1007/978-3-319-03762-2 _ 5,  ©  Springer  International  Publishing  Switzerland  2014 
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Using  the  rules  of  combinatorics,  the  event  “outcome  A  k  times  in  n  trials” 

n 
k 

occurrences  of  A  and  A  (see  Appendix  B).  The  probability  of  this  event  is 
therefore 


k\(n  —  k)\  different  ways,  according  to  the  order  of  the 


occurs  in 


P  (k)  =  Wt 


pkqn~k 


(5.1.3) 


We  are  interested  in  the  mean  value  and  variance  of  x.  We  first  find  these  quan¬ 
tities  for  the  variable  x,  of  an  individual  event.  According  to  (3.3.2)  one  has 


E(Xi)  -  1  ■  p  +  0-q 


(5.1.4) 


and 


<t2(x,-)  =  E{(xi  -p)2}  =  (1  -  p)2p  +  (0- p)2q  , 

o-2(x,)  =  pq  .  (5.1.5) 

From  the  generalization  of  (3.5.3)  for  x  =  ^x(  it  follows  that 

n 

E(x)  —  ^^p  —  np  ,  (5.1.6) 

1  =  1 

and  from  (3.5.10),  since  all  of  the  covariances  vanish  because  the  x,  are 
independent,  one  has 

<j2{x)—npq  .  (5.1.7) 

Figure  5.1  shows  the  distribution  W{‘  for  various  n  and  for  fixed  p,  and  Fig.  5.2 
shows  it  for  fixed  n  and  various  values  of  p.  Finally  in  Fig.  5.3  n  and  p  are 
both  varied  but  the  product  np  is  held  constant.  The  figures  will  help  us  to  see 
relationships  between  the  binomial  distribution  (5. 1 .3)  and  other  distributions. 

A  logical  extension  of  the  binomial  distribution  deals  with  experiments 
where  more  than  two  different  outcomes  are  possible.  Equation  (5.1.1)  is  then 
replaced  by 

E  =  A\-\-  A2  H - b  Ai  .  (5.1.8) 

Let  the  probability  for  the  outcome  A  j  be 

t 

P(Aj)  =  Pj  ,  J2pj  =  1  •  C5-l-9> 

7=1 

We  consider  again  n  trials  and  ask  for  the  probability  that  the  outcome  Aj 
occurs  kj  times.  This  is  given  by 


5.1  The  Binomial  and  Multinomial  Distributions 
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Fig. 5.1  :  Binomial  distribution  for  various  values  of  n ,  with  p  fixed. 


i 

J2kj=n  .  (5.1.10) 

7  =  1 


The  proof  is  left  to  the  reader.  The  probability  distribution  (5.1.10)  is  called 
the  multinomial  distribution. 

We  can  define  a  random  variable  x,y  that  takes  on  the  value  1  when  the 
/ th  trial  leads  to  the  outcome  Aj,  and  is  zero  otherwise.  In  addition  define 
X7  =  J2'i=l  Xij-  The  expectation  value  of  Xj  is  then 


E(Xj )  —  xj  —  npj 


(5.1.11) 
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Fig. 5.2  :  Binomial  distribution  for  various  values  of  p,  with  n  fixed. 


The  elements  of  the  covariance  matrix  of  the  X  y  are 

cij  =npi(8ij  -  pj)  .  (5.1.12) 

The  off-diagonal  elements  are  clearly  not  zero.  This  was  to  be  expected,  since 
from  Eq.  (5.1.9)  the  variables  x;  are  not  independent. 


5.2  Frequency:  The  Law  of  Large  Numbers 


Usually  the  probabilities  for  the  different  types  of  events,  e.g.,  pj  in  the  case 
of  the  multinomial  distribution,  are  not  known  but  have  to  be  obtained  from 
experiment.  One  first  measures  th t  frequency  of  the  events  in  n  experiments, 


(5.2.1) 


Unlike  the  probability,  the  frequency  is  a  random  quantity,  since  it  depends 
on  the  outcomes  of  the  n  individual  experiments.  By  use  of  (5.1.11),  (5.1.12), 
and  (3.3.15)  we  obtain 


E(hj)  =  hj  =  E 


n 


=  Pj 


(5.2.2) 
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and 


_Z/U  r  2  /  ^7 
a  (n7)  =  cr 


1 


1 


77 


=  T9°r  (Xj)  =  -PjQ--Pj) 


(5.2.3) 


ft 


ft 


Fig.  5.3  :  Binomial  distribution  for  various  values  of  n  but  fixed  product  np.  For  higher  values 
of  77  the  distribution  changes  very  little. 


The  product  pj(  1  —  Pj)  in  Eq.  (5.2.3)  is  at  most  1/4.  One  sees  that  the 
expectation  value  of  the  frequency  of  an  event  is  exactly  equal  to  the  prob¬ 
ability  that  the  event  will  occur,  and  that  the  variance  of  frequency  about 
this  expectation  value  can  be  made  arbitrarily  small  as  the  number  of  trials 
increases.  Since  pq  is  at  most  1/4,  one  can  always  say  that  the  standard  de¬ 
viation  of  h  j  is  at  most  1  / -Jn  .  This  property  of  the  frequency  is  known  as  the 
law  of  large  numbers.  It  is  clearly  the  reason  for  the  frequency  definition  of 
probability  given  by  Eq.  (2.2.1). 

Frequently  the  purpose  of  an  experimental  investigation  is  to  determine 
the  probability  for  the  occurrence  of  a  certain  type  of  event.  According 
to  (5.2.2)  we  can  use  the  frequency  as  an  approximation  of  the  probability. 
The  square  of  the  error  of  this  approximation  is  then  inversely  proportional  to 
the  number  of  individual  experiments.  This  kind  of  error,  which  originates 
from  the  fact  that  only  a  finite  number  of  experiments  can  be  performed, 
is  called  the  statistical  error.  It  is  of  prime  importance  for  applications  that 
are  concerned  with  the  counting  of  individual  events,  e.g.,  nuclear  particles 
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passing  through  a  counter,  animals  with  certain  traits  in  heredity  experiments, 
defective  items  in  quality  control,  and  so  forth. 

Example  5.1:  Statistical  error 

Suppose  it  is  known  from  earlier  experiments  that  a  fraction  R  «  1  /200  of  a 
sample  of  fruit  flies  (Drosophila)  develop  a  certain  property  A  if  exposed  to 
a  given  dose  of  X-rays.  An  experiment  is  planned  to  determine  the  fraction  R 
with  an  accuracy  of  1%.  How  large  must  the  original  sample  be  in  order  to 
achieve  this  accuracy? 

We  use  Eq.  (5.2.3)  and  find  pj  =  0.005,  (1  —  pj )  &  1.  We  must  now 
chooser?  such  that  a(hj)/hj  =  200 o(hj)  =  0.01. This  gives  o(hj)  =  0.00005 
and  cr2(hj )  =  0.25  x  10-8.  Equation  (5.2.3)  gives 

o  1 

0.25  x  1(T8  =  -  x  0.005 

n 

and  therefore 

n  =  2  x  106  . 

A  total  of  two  million  fruit  flies  would  have  to  be  used.  This  is  practically 
impossible.  To  determine  the  fraction  R  with  an  accuracy  of  10%  would 
require  20  000  flies.  ■ 

5.3  The  Hypergeometric  Distribution 

Although  we  shall  rigorously  introduce  the  concept  of  random  sampling  at  a 
later  point,  we  will  now  discuss  a  typical  problem  of  sampling.  We  consider 
a  container  -  we  shall  not  break  with  the  habit  of  mathematicians  of  calling 
such  a  container  an  urn  -  with  K  white  and  L  =  N  —  K  black  balls.  We  want 
to  determine  the  probability  that  in  drawing  n  balls  (without  replacing  them) 
we  will  find  exactly  k  white  and  l  =n—k  black  ones.  The  problem  is  rendered 
difficult  by  the  fact  that  the  drawing  of  a  ball  of  a  particular  color  changes  the 
ratio  of  white  and  black  balls  and  therefore  influences  the  outcome  of  the  next 
draw.  One  clearly  has  rj  equally  likely  ways  to  choose  n  out  of  N  balls.  The 

probability  that  one  of  these  possibilities  will  occur  is  therefore  1  / (^) .  There 

are  (f)  ways  to  choose  k  of  the  K  white  balls,  and  m  ways  to  choose  £  of 
the  L  black  ones.  The  required  probability  is  therefore 


As  in  Sect.  5.1  we  define  the  random  variable  x  =  1  x,  with  x,  =  1  when 

the  ?th  draw  results  in  a  black  ball,  and  x,  =  0  otherwise.  (In  other  words,  we 
define  k  as  the  random  variable  x.) 
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To  compute  the  expectation  values  of  x  we  cannot  simply  add  the  expec¬ 
tation  values  of  the  x, ,  since  these  are  no  longer  independent.  Instead  we  must 
return  to  the  definition  (3.3.2), 


E(x)  = 


( 


•>&(■)( 


K\  (N  -  K 
n  —  i 


(N  —  n)\n\ 


n 


£7 


iK\(N  —  K)\ 


N\  ^  i\(K  -  i)\(n  -  i)\(N  -  K  -  n  +  i)\ 

i= 1 


n(n  —  \)\(N  —  n)\ 
N{N-\)\ 


n 


£  773 


K\ 


i= 1 


X 


(/  —  i)!(^r  —  1  —  0  — 1»! 

(N-K)\ 


(n  —  !—(/  —  l))\(N  -K-(n-l)  +  (i-  1))! 


If  we  substitute  i  —  1  =  j,  this  gives 


E(x)  = 


K  (n-  \)\(N-n)\ 
(TV  —  1)! 


n  —  1 


X 


^  (K  —  1 ) !  (TV  —  K)\ 

^  il(K  —  1  —  i)l(n  —  1  —  i)l(N  — 


j=  0 


j)\(N-K-(n  -1)  +  ;)! 


if  1  -  1 


=  n - 


/v  p-1)  2Z  v  j 

\n  —  \ )  j=  0  X  J 


N-K 
n-l-  j 


With  Eq.  (B.5)  we  obtain 

K 

E(x)  =  n—  .  (5.3.2) 

N 

The  calculation  of  the  variance  follows  along  the  same  lines  but  is  rather 
lengthy.  The  result  is 


a2(x)  = 


n  K(N  —  K)(N  —  n) 
N2(N-  1) 


(5.3.3) 


Figures  5.4  and  5.5  depict  several  examples  of  the  distribution.  If  n  N, 
then  drawing  a  white  ball  has  little  influence  on  the  probabilities  for  the  next 
draw.  We  therefore  expect  that  in  this  case  Wk  behaves  in  a  manner  similar  to 
a  binomial  distribution  with  P  =  jt  and  q  =  NNR  .  This  is  also  made  clear  by 
the  similarity  of  Figs.  5.5  and  5.1.  One  obtains  in  fact  the  same  expectation 
value, 


K 

E(x)  =  n —  =  np 
N 
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as  for  the  binomial  distribution.  The  variance  is  then 

2,  x  npq(N-n) 
a  (X)  = -  , 

N-l 

which  for  the  case  n  <^N  becomes 


cr  —  npq 


Fig. 5.4:  Hypergeometric  distribution  for  various  values  of  n  and  small  values  of  K  and  N . 


There  are  many  applications  of  the  hypergeometric  distribution.  Opinion 
polls,  quality  controls,  and  so  forth  are  all  based  on  the  experimental  scheme 
of  taking  (polling)  an  object  without  replacement  back  into  the  original 
sample  or  population.  The  distribution  can  be  generalized  in  two  ways.  First 
we  can  of  course  consider  more  properties  instead  of  just  two  (white  and 
black  balls).  This  leads  us  to  a  similar  transition  as  the  one  from  the  binomial 
to  the  multinomial  distribution.  The  original  sample  (population)  contains  N 
elements  each  of  which  possesses  one  of  l  properties, 

N  =  TV]  +  N2  +  •  •  •  +  Ni 

The  probability  that  n  draws  (without  replacement)  will  be  composed  as 


n  =  n\  +H2  H - \-ti£ 
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Fig.  5.5:  Hypergeometric  distribution  for  various  values  of  n  and  for  large  values  of  K  and 
N. 


is,  in  analogy  to  Eq.  (5.3.1), 


W, 


(5.3.4) 


Another  extension  of  the  hypergeometric  distribution  is  obtained  in  the  fol¬ 
lowing  way.  We  saw  earlier  that  consecutive  drawings  ceased  to  be  indepen¬ 
dent  because  the  balls  were  not  replaced.  If  now  each  time  we  draw  a  ball  of 
one  type  we  place  more  balls  of  that  type  back  in  the  urn,  this  dependence  can 
be  enhanced.  One  then  obtains  the  Polya  distribution.  It  is  of  importance  in 
the  study  of  epidemic  diseases,  where  the  appearance  of  a  case  of  the  disease 
enhances  the  probability  of  future  cases. 


Example  5.2:  Application  of  the  hypergeometric  distribution  for 
determination  of  zoological  populations 

From  a  pond  K  fish  are  taken  and  marked.  They  are  then  returned  to  the  pond. 
After  a  short  while  n  fish  are  caught,  k  of  which  are  found  to  be  marked.  Be¬ 
fore  the  second  time  that  the  fish  are  taken,  the  pond  contains  a  total  of  N  fish, 
of  which  K  are  marked.  The  probability  of  finding  k  marked  out  of  n  removed 
fish  is  given  by  Eq.  (5.3.1).  We  will  return  to  this  problem  in  Example  7.3.  ■ 
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5.4  The  Poisson  Distribution 


Looking  at  Fig.  5.3  it  appears  that  if  n  tends  to  infinity,  but  at  the  same  time 
np  =  A.  is  kept  constant,  the  binomial  distribution  approaches  a  certain  fixed 
distribution.  We  rewrite  Eq.  (5.1.3)  as 


pkq 


n—k 


AWl-b" 

k\(n  —  k)\\n)  D  — 

Xkn(n-  X)(n  —  2)  ■  •  ■  (n  -  k  +  X)  (l  —  £)” 
k\  nk  ^  _  h\k 

*( ,  D-O-hO-D-O-^1) 

k\  \  n)  ^  _  k^k 


In  the  limiting  case  all  of  the  many  individual  factors  of  the  term  on  the  right 
approach  unity.  In  addition  one  has 


lim 

n^oo 


5 


so  that  in  the  limit  one  has 

yk 

lim  Wnk  =  f(k)  =  —t-x  .  (5.4.1) 

«->•  OO  AC! 

The  quantity  f(k )  is  the  probability  of  the  Poisson  distribution.  It  is  plotted 
in  Fig.  5.6  for  various  values  of  A.  As  is  the  case  for  the  other  distributions  we 
have  encountered  so  far,  the  Poisson  distribution  is  only  defined  for  integer 
values  of  k. 

The  distribution  satisfies  the  requirement  that  the  total  probability  is  equal 
to  unity, 


Em 


£=0 
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£/(*)  =  1  '  (5-4.2) 

k= 0 


The  expression  in  parentheses  is  in  fact  the  Taylor  expansion  of  ex . 

We  now  want  to  determine  the  mean,  variance,  and  skewness  of  the 
Poisson  distribution.  The  definition  (3.3.2)  gives 


OO  *  £ 


OO  *  £ 


£<k)  =  E*ire~A  =  E*-e~A 


k= 0 
oo 


k—\ 


k\ 


XX 


k- 1 


oo 


=  Ettz 


-X 


k= 1 


(k  —  1)! 


=xEi 


7=0 


J'- 


and  using  this  with  (5.4.2), 


£(k)  =  k 


(5.4.3) 


We  would  now  like  to  find  £(k2).  One  obtains  in  a  corresponding  way 


OO  *  £ 


00  •»  k—  1 


£<k2)  =  E*V“1=*E* 


X1 


—X 


k= l 


k= 1 


oo 


(k-  1)! 


oo 


=  xE(i+D^=MEi/'l+> 

7=0  y'  \j=0  J' 


and  therefore 


£(k2)  =  A(k  +  l) 


(5.4.4) 


We  will  use  Eqs.  (5.4.3)  and  (5.4.4)  to  compute  the  variance.  According  to 
Eq.  (3.3.16)  one  has 


(j2(k)  =  £(k2)  -  {£(k)}2  =  k(k  +  1)  -  E 


(5.4.5) 


or 

cr2(k)  =  X  .  (5.4.6) 

We  now  consider  the  skewness  (3.3.13)  of  the  Poisson  distribution.  Following 
Sect.  3.3  we  easily  find  that 


Lt3  =  E{(k-k)3}  =  X 
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Fig.  5.6  :  Poisson  distribution  for  various  values  of  X. 


The  skewness  (3.3.13)  is  then 


(5.4.7) 


that  is,  the  Poisson  distribution  becomes  increasingly  symmetric  as  X  in¬ 
creases.  Figure  5.6  shows  the  distribution  for  various  values  of  X.  In  particular 
the  distribution  with  X  =  3  should  be  compared  with  Fig.  5.3. 

We  have  obtained  the  Poisson  distribution  from  the  binomial  distribution 
with  large  n  but  constant  X  =  np,  i.e.,  small  p.  We  therefore  expect  it  to  apply 
to  processes  in  which  a  large  number  of  events  occur  but  of  which  only  very 
few  have  a  certain  property  of  interest  to  us  (i.e.,  a  large  number  of  “trials” 
but  few  “successes”). 


Example  5.3:  Poisson  distribution  and  independence  of  radioactive  decays 

We  consider  a  radioactive  nucleus  with  mean  lifetime  r  and  observe  it  for  a 
time  T  «  r.  The  probability  that  it  decays  within  this  time  interval  is  W  1. 
We  break  the  observation  time  T  into  n  smaller  time  intervals  of  length  t,  so 
that  T  =  nt.  The  probability  for  the  nucleus  to  decay  in  a  particular  time  in¬ 
terval  is  p  &  W/n.  We  now  observe  a  radioactive  source  containing  N  nuclei 
which  decay  independently  from  each  other  for  a  total  time  T ,  and  detect  a\ 
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decays  in  time  interval  1 ,  c/2  decays  in  interval  2,  etc.  Let  h (k)  be  the  frequency 
of  decays  observed  in  the  interval  k,  with  (k  =  0, 1, . . .).  That  is,  if  n &  is  the 
number  of  intervals  with  k  decays,  then  h(k)  =  n&/n.  In  the  limit  N  — >  oo 
and  for  large  n  the  frequency  distribution  h(k)  becomes  the  probability  dis¬ 
tribution  (5.4.1).  The  statistical  nature  of  radioactive  decay  was  established  in 
this  way  in  a  famous  experiment  by  Rutherford  and  Geiger.  ■ 

Similarly,  the  frequency  of  finding  k  stars  per  element  of  the  celestial 
sphere  or  k  raisins  per  volume  element  of  a  fruit  cake  is  distributed  according 
to  the  Poisson  law,  but  not,  however,  the  frequency  of  finding  k  animals  of  a 
given  species  per  element  of  area,  at  least  if  these  animals  live  in  herds,  since 
in  this  case  the  assumption  of  independence  is  not  fulfilled. 

As  a  quantitative  example  of  the  Poisson  distribution  many  textbooks  dis¬ 
cuss  the  number  of  Prussian  cavalrymen  killed  during  a  period  of  20  years  by 
horse  kicks,  an  example  originally  due  to  VON  Bortkiewicz  [6],  We  prefer 
to  turn  our  attention  to  a  somewhat  less  macabre  example  taken  from  a  lecture 
of  De  Solla  Price  [7], 

Example  5.4:  Poisson  distribution  and  the  independence  of  scientific 
discoveries 

The  author  first  constructs  the  model  of  an  apple  tree  with  1000  apples  and 
1000  pickers  with  blindfolded  eyes  who  each  try  at  the  same  time  to  pick 
an  apple.  Since  we  are  dealing  with  a  model,  they  do  not  hinder  each  other 
but  it  can  happen  that  two  or  several  of  them  will  attempt  to  pick  the  same 
apple  at  the  same  time.  The  number  of  apples  grabbed  simultaneously  by  k 
people  (k  =  0, 1, 2, . . .)  follows  a  Poisson  distribution.  It  was  determined  by 
De  Solla  Price  that  the  number  of  scientific  discoveries  made  indepen¬ 
dently  twice,  three  times,  etc.  is  also  distributed  according  to  the  Poisson  law, 
in  a  way  similar  to  the  principle  of  the  blindfolded  apple  pickers  (Table  5.1). 
One  gets  the  impression  that  scientists  are  not  concerned  with  the  activities 
of  their  colleagues.  De  Solla  Price  believes  that  this  can  be  explained  by 
the  assumption  that  scientists  have  a  strong  urge  write  papers,  but  feel  only  a 
relatively  mild  need  to  read  them.  ■ 

5.5  The  Characteristic  Function  of  a  Distribution 

So  far  we  have  only  considered  real  random  variables.  In  fact,  in  Sect.  3.1  we 
have  introduced  the  concept  of  a  random  quantity  as  a  real  number  associated 
with  an  event.  Without  changing  this  concept  we  can  formally  construct  a 
complex  random  variable  from  two  real  ones  by  writing 


z  =  x  +  iy 


(5.5.1) 
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Table5.1:  Simultaneous  discovery  and  the  Poisson  distribution. 


Number  of 
simultaneous 
discoveries 

Cases  of 
simultaneous 
discovery 

Prediction 
of  Poisson 
distribution 

0 

Not  defined 

368 

1 

Not  known 

368 

2 

179 

184 

3 

51 

61 

4 

17 

15 

5 

6 

3 

>6 

8 

1 

As  its  expectation  value  we  define 

E(z)  =  £(x)  +  i£(y)  .  (5.5.2) 

By  analogy  with  real  variables,  complex  random  variables  are  independent  if 
the  real  and  imaginary  parts  are  independent  among  themselves. 

If  x  is  a  real  random  variable  with  distribution  function  F(x)  =  P(x  <  x) 
and  probability  density  fix),  we  define  its  characteristic  function  to  be  the 
expectation  value  of  the  quantity  exp(itx): 

(pit)  —  £'{exp(irx)}  .  (5.5.3) 

That  is,  in  the  case  of  a  continuous  variable  the  characteristic  function  is  a 
Fourier  integral  with  its  known  transformation  properties: 

/CO 

exp(itx)/(x)dx  .  (5.5.4) 

-CO 

For  a  discrete  variable  we  obtain  instead  from  (3.3.2) 

(pit)  =  E  exp(itx()P(x  =  Xj)  .  (5.5.5) 

i 

We  now  consider  the  moments  of  x  about  the  origin, 

/CO 

xw/(x)dx  , 

-00 


(5.5.6) 
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and  find  that  kn  can  be  obtained  simply  by  differentiating  the  characteristic 
function  n  times  at  the  point  /  =  0: 


*,<">«)  = 


d  n<p(t) 
d  tn 


/CO 

xn  exp(it x)  f  (x)  dx 

-00 


and  therefore 


<P(n\ 0)=inXn 


(5.5.7) 


If  we  now  introduce  the  simple  coordinate  translation 


y  =  x-x 


(5.5.8) 


and  construct  the  characteristic  function 


/OO 

exp{if  (jc  —  x)}f(x)dx  —  (p(t)exp(- itx)  , 

-OO 


(5.5.9) 


then  its  nth  derivative  is  (up  to  a  power  of  i)  equal  to  the  nth  moment  of  x 
about  the  expectation  value  [cf.  (3.3.8)]: 


<^(0)=iV«  =  i"£{(x  — *)"}  , 


(5.5.10) 


and  in  particular 


cr*(x)  =  -<p"(0) 


y 


(5.5.11) 


Inverting  the  Fourier  transform  (5.5.4)  we  see  that  it  is  possible  to  obtain  the 
probability  density  from  the  characteristic  function, 


OO 


1  f 

f(x)  =  —  exp(—itx)<p(t)dt 
2tt  J_, 


(5.5.12) 


■OO 


It  is  possible  to  show  that  a  distribution  is  determined  uniquely  by  its  charac¬ 
teristic  function.  This  is  the  case  even  for  discrete  variables  where  one  has 


F(b)  -  F(a )  = 


-f 

2n 


exp(i  tb)  —  exp(ifa) 


t 


(p(t)dt 


(5.5.13) 


since  in  this  case  the  probability  density  is  not  defined.  Often  it  is  more 
convenient  to  use  the  characteristic  function  rather  than  the  original  distri¬ 
bution.  Because  of  the  unique  relation  between  the  two  it  is  possible  to  switch 
back  and  forth  at  any  place  in  the  course  of  a  calculation. 

We  now  consider  the  sum  of  two  independent  random  variables 


w  =  x  +  y 
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Its  characteristic  function  is 

(pw(t)  =  f’texpfirCx  +  y)}]  =  £’{exp(itx)exp(iry)}  . 

Generalizing  relation  (3.5.13)  to  complex  variables  we  obtain 

(pwit)  =  £{exp(itx)}£{exp(iry)}  =  (px{t)(py{t)  ,  (5.5.14) 

i.e.,  the  characteristic  function  of  a  sum  of  independent  random  variables  is 
equal  to  the  product  of  their  respective  characteristic  functions. 


Example  5.5:  Addition  of  two  Poisson  distributed  variables  with  use  of  the 
characteristic  function 

From  Eqs.  (5.5.5)  and  (5.4.1)  one  obtains  for  the  characteristic  function  of  the 
Poisson  distribution 


CO 


<p(t)  =  ^exp(i»«^exp(-A)=exp(-wf;^-eXP<i'))t 

t=0  *  k= 0 


k\ 


—  exp(— /\.)exp(Aeir)  =  exp{A(e1'  —  1)} 


At 


(5.5.15) 


We  now  form  the  characteristic  function  of  the  sum  of  two  independent 
Poisson  distributed  variables  with  mean  values  Ai  and  A.2, 

<Aum(0  =  exp{Ai(eu  -  l)}exp{A.2(elf  -  1)} 

=  exp{(Ai+A2)(ei?- 1)}  .  (5.5.16) 

This  is  again  of  the  form  of  Eq.  (5.5. 15).  Therefore  the  distribution  of  the  sum 
of  two  independent  Poisson  distributed  variables  is  itself  a  Poisson  variable. 
Its  mean  is  the  sum  of  the  means  of  the  individual  distributions.  ■ 


5.6  The  Standard  Normal  Distribution 

The  probability  density  of  the  standard  normal  distribution  is  defined  as 

f(x)  =  Mx)  =  ^Le“*2/2  .  (5.6.1) 

V27T 

This  function  is  depicted  in  Fig.  5.7a.  It  has  a  bell  shape  with  the  maxi¬ 
mum  at  v  =  0.  From  Appendix  D.l  we  have 
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Fig.  5.7:  Probability  density  (a)  and  distribution  function  (b)  of  the  standard  normal  distribu¬ 
tion. 


/°°  2 

e“*  /2 dx  =  V2tt  ,  (5.6.2) 

-00 

so  that  0o  0*0  is  normalized  to  one  as  required.  Using  the  symmetry  of 
Fig.  5.7  a,  or  alternatively,  using  the  antisymmetry  of  the  integrand  we  conclude 
that  the  expectation  value  is 


x  = 


(5.6.3) 
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By  integrating  by  parts  we  can  compute  the  variance  to  be 


a 


2 


(5.6.4) 


since  the  expression  in  the  square  brackets  vanishes  at  the  integral’s  boundaries 
and  the  integral  in  curly  brackets  is  given  by  Eq.  (5.6.2). 

The  distribution  function  of  the  standard  normal  distribution 


1  fX  2 

F(x)  =  fo(x)  =  —=  /  e-'  /2d t  (5.6.5) 

V  2  JX  J  —  OO 

is  shown  in  Fig.  5.7b.  It  cannot  be  expressed  in  analytic  form.  It  is  tabulated 
numerically  in  Appendix  C.4. 


5.7  The  Normal  or  Gaussian  Distribution 


The  standardized  distribution  of  the  last  section  had  the  properties  x  =  E  (x)  = 
0,  cr2(x)  =  1,  i.e.,  the  variable  x  had  the  properties  of  the  standardized  variable 
u  in  Eq.  (3.3.17).  If  we  now  replace  x  by  (x  —  a)/b  in  (5.6.1),  we  obtain  the 
probability  density  of  the  normal  or  Gaussian  distribution. 


f(x)  =  (j)(x)  = 


{x  —  a)2 

2b2  ' 


(5.7.1) 


with 

x~  —  a  ,  cr2(x)  —  b2  .  (5.7.2) 


The  characteristic  function  of  the  normal  distribution  (5.7.1)  is,  using 
Eq.  (5.5.4), 


1  C°°  /  (x  —  a)2 

<p(t)  =  _  /  exp(itx)exp  ( - — - —  1  dx 


OO 


lb2 


(5.7.3) 


With  u  =  (x  —  a)/b  one  obtains 


(pit)  = 


V2 

1 


i  r°° 

2tc  J — oo 


1  2  . 

exp{ — u  +it(bu+a)}du 
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exp(ita) 


71 


/OO 

-oo 


1  2  • 

exp{ — u  +itbu}du 


(5.7.4) 
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By  completing  the  square  the  integral  can  be  rewritten  as 

OO  I 


/OO 

-OO 


exp{ — u  +  i  tbu)  dw 


/OO  1  1 

exp{ — ( u  —  itb )2 - t2b2}du 

-OO  2  2 


1 


OO 


1 


/' 

exp{ — ( u  —  \tbY)du 

-r>r\  2 


(5.7.5) 


— OO 


With  r  —  u  —  itb  the  last  integral  takes  on  the  form 

/oo— itb  i 

exp{— — r2}  dr 

-OO — itb  ^ 


The  integrand  does  not  have  any  singularities  in  the  complex  r  plane.  Accord¬ 
ing  to  the  residue  theorem,  therefore,  the  contour  integral  around  any  closed 
path  vanishes.  Consider  a  path  that  runs  along  the  real  axis  from  r  =  —  L  to 
r  —  L,  and  then  parallel  to  the  imaginary  axis  from  r  —  L  to  r  —  L  —  itb  and 
from  there  antiparallel  to  the  real  axis  to  r  =  —  L  —  itb,  and  finally  back  to 
the  starting  point  r  —  L.  In  the  limit  L  — »■  oo  the  integrand  vanishes  along  the 
parts  of  the  path  that  run  parallel  to  the  imaginary  axis.  One  then  has 


/*  oo — itb 
— oo — itb 


exp{ 


5 


i.e.,  we  can  extend  the  integral  to  cover  the  entire  real  axis.  The  integral  is 
computed  in  Appendix  D.  1  and  has  the  value 

/OO  J 

exp{ — r2}dr  =  V27r  .  (5.7.6) 

-OO  2 


Substituting  this  into  Eqs.  (5.7.5)  and  (5.7.4)  we  obtain  finally  the  character¬ 
istic  function  of  the  normal  distribution 

1 


cp(t)  =  exp(ita)exp( — b2t2) 


(5.7.7) 


For  the  case  a  —  0  one  obtains  from  this  the  following  interesting  theorem: 


A  normal  distribution  with  mean  value  zero  has  a  character¬ 
istic  function  that  has  itself  (up  to  normalization)  the  form  of  a 
normal  distribution.  The  product  of  the  variances  of  both  func¬ 
tions  is  one. 


If  we  now  consider  the  sum  of  two  independent  normal  distributions,  then 
by  applying  Eq.  (5.5.14)  one  immediately  sees  that  the  characteristic  func¬ 
tion  of  the  sum  is  again  of  the  form  of  Eq.  (5.7.7).  The  sum  of  independent 
normally  distributed  quantities  is  therefore  itself  normally  distributed.  The 
Poisson  distribution  behaves  in  a  similar  way  (cf.  Example  5.5). 
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5.8  Quantitative  Properties  of  the  Normal  Distribution 

Figure  5.7a  shows  the  probability  density  of  the  standard  Gaussian  distribution 
00  (x)  and  the  corresponding  distribution  function.  By  simple  computation 
one  can  determine  that  the  points  of  inflection  of  (5.6.1)  are  at  x  =  ±1.  [In  the 
case  of  a  general  Gaussian  distribution  (5.7.1)  they  are  at  jc  =  a  ±  b.\  The 
distribution  function  0o(x)  gives  the  probability  for  the  random  variable  to 
take  on  a  value  smaller  than  x : 

0o(x)  =  P(x  <  x)  .  (5.8.1) 


By  symmetry  one  has 

P(|x|  >  x)  =  20O(— |x|)  =  2{1  -  0o(|x|)}  (5.8.2) 

or  conversely,  the  probability  to  obtain  a  random  value  within  an  interval  of 
width  2x  about  zero  (the  expectation  value)  is 

P(|x|  <x)  =  20o(|x|)-  1  .  (5.8.3) 


Since  the  integral  (5.6.5)  is  not  easy  to  evaluate,  one  typically  finds  the  values 
of  (5.8.1)  and  (5.8.3)  from  statistical  tables,  e.g.,  in  Tables  1.2  and  1.3  of  the 
appendix. 

One  can  now  extend  this  relation  to  the  general  Gaussian  distribution 
given  by  Eq.  (5.7.1).  Its  distribution  function  is 


t  (x )  =  00 


x  —  a 


b 


(5.8.4) 


We  are  interested  in  finding  the  probability  to  obtain  a  random  value  inside 
(or  outside)  of  a  given  multiple  of  a  =  b  about  the  mean  value: 


P(  |x 


<  no)  —  20o 


-  1  =  20o(n)  -  1 


(5.8.5) 


From  Table  1.3  we  find 


P(  lx 
P(  lx 

P(|x 


a 

a 

a 


<  o)  =  68.3% 

<  2o)  =  95.4% 

<  3 cr)  =  99.8% 


P(  lx 
P(  |x 

P(|x 


a 

a 

a 


>o)  —  31.7% 

>  2 o')  =  4.6% 

>  3 o')  =  0.2% 


(5.8.6) 


As  we  will  see  later  in  more  detail,  one  can  often  assume  that  the 
measurement  errors  of  a  quantity  are  distributed  according  to  a  Gaussian 
distribution  about  zero.  This  means  that  the  probability  to  obtain  a  value 
between  x  and  x  +  dr  is  given  by 


5.8  Quantitative  Properties  of  the  Normal  Distribution 
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P(x  <x  <  x +  dx)  =  4>(x)dx 

The  dispersion  o  of  the  distribution  4>(x  )  is  called  the  standard  deviation  or 
standard  error.  If  the  standard  error  of  an  instrument  is  known  and  one  car¬ 
ries  out  a  single  measurement,  then  Eq.  (5.8.6)  tells  us  that  the  probability  that 
the  true  value  is  within  an  interval  given  by  plus  or  minus  the  standard  error 
about  the  measured  value  is  68.3%.  It  is  therefore  a  common  practice  to  mul¬ 
tiply  the  standard  error  with  a  more  or  less  arbitrary  factor  in  order  to  improve 
this  percentage.  (One  obtains  around  99.8%  for  the  factor  3.)  This  procedure 
is,  however,  misleading  and  often  harmful.  If  this  factor  is  not  explicitly  stated, 
a  comparison  of  different  measurements  of  the  same  quantity  and  especially 
the  calculation  of  a  weighted  average  (cf.  Example  9. 1)  is  rendered  impossible 
or  is  liable  to  be  erroneous. 

The  quantiles  [see  Eq.  (3.3.25)]  of  the  standard  normal  distribution  are 
of  considerable  interest.  For  the  distribution  function  (5.6.5)  one  obtains  by 
definition 

P(xp)  =  P(x  <  xp)  =  fo(xp)  .  (5.8.7) 

The  quantile  xp  is  therefore  given  by  the  inverse  function 

xp  =  Q(P)  (5.8.8) 

of  the  distribution  function  \J/o(xp).  This  is  computed  numerically  in 
Appendix  C.4  and  is  given  in  Table  1.4.  Figure  5.8  shows  a  graphical  rep¬ 
resentation. 

We  now  consider  the  probability 

P'(x)  =  P(|x|  <  x)  ,  x  >  0  ,  (5.8.9) 

for  a  quantity  distributed  according  to  the  standard  normal  distribution  to  dif¬ 
fer  from  zero  in  absolute  value  by  less  than  x.  Since 

P(x)  —  P (x  <  x)  —  x//q(x)  —  I  (/>o(x)dx 

J  —  OO 

1  fx  11,  1  , 

=  n  ^  I  </>o(x)dx  =  -  +  —P  (x)  =  -( P  (x)  +  1)  , 

Z  Jo  Z  Z  L 

it  is  the  inverse  function  and  with  it  the  quantiles  of  the  distribution  function 
P'(x)  are  obtained  by  substituting  ( P'(x)  +  l)/2  for  the  argument  of  the 
inverse  function  of  P : 

Xp  =  n\P')  =  Q((P'  + 1)/2)  .  (5.8.10) 


This  function  is  tabulated  in  Table  1.5. 
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Fig.  5.8:  Quantile  of  the  standard  normal  distribution. 


5.9  The  Central  Limit  Theorem 


We  will  now  prove  the  following  important  theorem.  If  x,  are  independent 
random  variables  with  mean  values  a  and  variances  b2,  then  the  variable 

n 

x  =  lim  Vx;  (5.9.1) 

n^oo  ' 

i  —  1 

is  normally  distributed  with 

E(x)  —  na  ,  o2(x)—nb 2  .  (5.9.2) 

From  Eq.  (3.3. 15)  one  then  has  that  the  variable 

1  1 
£  =  —  x  =  lim  — 

n  n  >-oo  n 

i  —  1 

is  normally  distributed  with 

<r2(£)  =&2/n 


E 


x, 


(5.9.3) 


E(M)=a 


(5.9.4) 


5.9  The  Central  Limit  Theorem 
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To  prove  this  we  assume  for  simplicity  that  all  of  the  x,  have  the  same 
distribution.  If  we  denote  the  characteristic  function  of  the  x,  by  (pit),  then 
the  sum  of  n  variables  has  the  characteristic  function  .  We  now  assume 

that  a  =  0.  (The  general  case  can  be  related  to  this  by  a  simple  coordinate 
translation  X;'  =  x,  —  a.)  From  (5.5.10)  we  have  the  first  two  derivatives  of 
cp(t)  at  t  =  0, 

/(0)  =  0  ,  (p'\tS)  =  -a 2  . 

We  can  therefore  expand, 

l 

(Px'it)  =  1  —  -<y2t2  a —  . 

Instead  of  x,  let  us  now  choose 

X;  x,  -  a 

(j.  = _ l—  =  — _ 

b^/n  b^/n 

as  the  variable.  If  we  consider  n  to  be  fixed  for  the  moment,  then  this  implies 
a  simple  translation  and  a  change  of  scale.  The  corresponding  characteristic 
function  is 


(puiit)  =  E{expiitUj)}  = 


exp 


or 

t 2 

<Pui  (0  =  i  —  ^ 

The  higher-order  terms  are  at  most  of  the  order  n  .  If  we  now  consider  the 
limiting  case  and  use 


u 


ix-na) 
Inn  - — — 

n  ^oo  by/n 


we  obtain 


or 


(puit)  =  lim  {(pUiit)}n  =  lim 

fl^OQ  fl^OQ 


(pu  it)  =  exp 


(5.9.5) 


(5.9.6) 


This,  however,  is  exactly  the  characteristic  function  of  the  standard  normal 
distribution  0o («).  One  therefore  has  £(11)  =  0,  er2(u)  =  1.  Using  Eqs.  (5.9.5) 
and  (3.3.15)  leads  directly  to  the  theorem. 
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Example  5.6:  Normal  distribution  as  the  limiting  case  of  the  binomial 
distribution 

Suppose  that  the  individual  variables  x,  in  (5.9.1)  are  described  by  the  simple 
distribution  given  by  (5. 1 . 1)  and  (5. 1 .2),  i.e.,  they  can  only  take  on  the  value  1 
(with  probability  p)  or  0  (with  probability  1  —  p).  One  then  has  E(x,)  —  p, 
a2(Xi)  =  p(l  —  p).  The  variable 

n 

x(n)  =  (5-9-7) 
i=i 

then  follows  the  binomial  distribution,  P(x^  =  k)  =  W{‘  [see  Eqs.  (5.1.3), 
(5.1.6),  (5.1.7)].  As  done  for  (5.9.5)  let  us  consider  the  distribution  of 


u 


n 


E 


x,  -  p 

s/np(  \-p) 


1 

s/np (l-  p) 


(5.9.8) 


One  clearly  has  P(x  =  k)  =  P  (u99  =  ( k  —  np)/^/np(l  —  p))  =  W£.  These 
values,  however,  lie  increasingly  closer  to  each  other  on  the  u1"*  axis  as  n  in¬ 
creases.  Let  us  denote  the  distance  between  two  neighboring  values  of  u"  by 
Auin) .  Then  the  distribution  of  a  discrete  variable  P(u,"))/Z\u,")  finally  be¬ 
comes  the  probability  density  of  a  continuous  variable.  According  to  the  Cen¬ 
tral  Limit  Theorem  this  must  be  a  standard  normal  distribution.  This  is  illus¬ 
trated  in  Lig.  5.9,  where  P(u^)/Au^  is  shown  for  various  possible  values 
of  u(w).  ■ 


Example  5.7:  Error  model  of  Laplace 

In  1783  Laplace  made  the  following  remarks  concerning  the  origin  of  errors 
of  an  observation.  Suppose  the  true  value  of  a  quantity  to  be  measured  is  mo- 
Now  let  the  measurement  be  disturbed  by  a  large  number  n  of  independent 
causes,  each  resulting  in  a  disturbance  of  magnitude  e.  Lor  each  disturbance 
there  exists  an  equal  probability  for  a  variation  of  the  measured  value  in  either 
direction,  i.e.,  +e  or  —s.  The  measurement  error  is  then  composed  of  the  sum 
of  the  individual  disturbances.  It  is  clear  that  in  this  model  the  probability 
distribution  of  measurement  errors  will  be  given  by  the  binomial  distribution. 
It  is  interesting  nevertheless  to  follow  the  model  somewhat  further,  since  it 
leads  directly  to  the  famous  Pascal  triangle. 

Ligure  5.10  shows  how  the  probability  distribution  is  derived  from  the 
model.  The  starting  point  is  with  no  disturbance  where  the  probability  of 
measuring  mo  is  equal  to  one.  With  one  disturbance  this  probability  is  split 
equally  between  the  neighboring  possibilities  mo  +  e  and  mo  —  s.  The  same 
happens  with  every  further  disturbance.  Of  course  the  individual  probabilities 
leading  to  the  same  measured  value  must  be  added. 
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Fig. 5.9:  The  quantity  P  (u^)  /  Au^  for  various  values  of  the  discrete  variable  u('^  for  in¬ 
creasing  n. 
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Fig. 5.10:  Connection  between  the  Laplacian  error  model  and  the  binomial  distribution. 
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Each  line  of  the  resulting  triangle  contains  the  distribution  W£(k  — 
0, 1, . . . ,  n)  ofEq.  (5. 1.3)  for  the  case  p  —  q  —  1/2.  Multiplied  by  \/(pkqn~k)  — 
2”  it  becomes  a  line  of  binomial  coefficients  of  Pascal’s  triangle  (cf.  Ap¬ 
pendix  B). 

It  is  easy  to  relate  this  to  Example  5.6  by  extending  Eq.  (5.9.8)  and 
substituting  p  =  1/2.  For  n  oo  the  quantity 

„,„,_2(£?=,  eXi-ne/2) 


follows  a  normal  distribution  with  expectation  value  zero  and  standard  de¬ 
viation  y/ns/ 2.  Thus  Gaussian  measurement  errors  can  result  from  a  large 
number  of  small  independent  disturbances.  ■ 

The  identification  of  the  measurement  error  distribution  as  Gaussian  is  of 
great  significance  in  many  computations,  particularly  for  the  method  of  least 
squares.  The  normal  distribution  for  measurement  errors  is,  however,  not  a 
law  of  nature.  The  causes  of  experimental  errors  can  be  individually  very 
complicated.  One  cannot,  therefore,  find  a  distribution  function  that  describes 
the  behavior  of  measurement  errors  in  all  possible  experiments.  In  particular, 
it  is  not  always  possible  to  guaranty  symmetry  and  independence.  One  must 
ask  in  each  individual  case  whether  the  measurement  errors  can  be  modeled 
by  a  Gaussian  distribution.  This  can  be  done,  for  example,  by  means  of  a 
X2-test,  applied  to  the  distribution  of  a  measured  quantity  (see  Sect.  8.7).  It  is 
always  necessary  to  check  the  distribution  of  experimental  errors  before  more 
lengthy  computations  can  be  used  whose  results  are  only  meaningful  for  the 
case  of  a  Gaussian  error  distribution. 


5.10  The  Multivariate  Normal  Distribution 

Consider  a  vector  x  of  n  variables, 

x  =  (xj ,  y.2,  . . . ,  x„) 

We  define  the  probability  density  of  the  joint  normal  distribution  of  the  x,- 
to  be 

0(x)  =kexp{— ^(x  — a)T5(x  — a)}  =  kexp{— ^g(x)}  (5.10.1) 

8  00  =  (x  — a)T5(x  — a) 


with 


(5.10.2) 


5.10  The  Multivariate  Normal  Distribution 
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Here  a  is  an  n -component  vector  and  B  i  s  an  n  x  n  matrix,  which  is  symmetric 
and  positive  definite.  Since  <p  (x)  is  clearly  symmetric  about  the  point  x  =  a, 
one  has 


POO  POO 

'  ■I  (x  — a)0(x)cbcicbc2...cbcM  =  0  ,  (5.10.3) 

— oo  J—oo 


that  is, 


E(x-  a)  =  0 


or 

E(x)  =  a  .  (5.10.4) 

The  vector  of  expectation  values  is  therefore  given  directly  by  a. 

We  now  differentiate  Eq.  (5.10.3)  with  respect  to  a, 


POO  POO 

'  ■f  [I  —  (x  —  a)(x  —  a)T5]0(x)dxidx2-..dxw  =  0 

— oo  J—oo 


This  means  that  the  expectation  value  of  the  quantity  in  square  brackets  van¬ 
ishes, 

£{(x  — a)(x  — a)T}5  =  7 


or 

C  =  £{(x  — a)(x  — a)T}  =  B~x  .  (5.10.5) 

Comparing  with  Eq.  (3.6.19)  one  sees  that  C  is  the  covariance  matrix  of  the 
variables  x  =  (Xi,  X2, . . . ,  x„). 

Because  of  the  practical  importance  of  the  normal  distribution,  we  would 
like  to  investigate  the  case  of  two  variables  in  somewhat  more  detail.  In 
particular  we  are  interested  in  the  correlation  of  the  variables.  One  has 


C  =  B~l 


a. 


COV(Xi,X2) 


cov(Xi,x2) 

cr? 


By  inversion  one  obtains  for  B 


(5.10.6) 


B  = 


1 


Or 


-COV(Xi  ,x2) 


—  cov(Xi, X2)2  \  —  cov(Xi,X2) 


O', 


(5.10.7) 


One  sees  that  B  is  a  diagonal  matrix  if  the  covariances  vanish.  One  then  has 


Kn_(  i/<7  0 

B°-'  0  1/a,2 


(5.10.8) 


If  we  substitute  Bq  into  Eq.  (5.10.1),  we  obtain  -  as  expected  -  the  joint 
probability  density  of  two  independently  normally  distributed  variables  as  the 
product  of  two  normal  distributions: 
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<p  =  k  exp 


1  (x\  -fli)2\ 

2  a\  ) 


exp 


1  (x2-a2)2\ 

2  cr2 


(5.10.9) 


In  this  simple  case  the  constant  k  takes  on  the  value 


1 

27V  0\C>2 


as  can  be  determined  by  integration  of  (5.10.9)  or  simply  by  comparison  with 
Eq.  (5.7.1).  In  the  general  case  of  n  variables  with  non- vanishing  covariances, 
one  has 


det  B  \  2 
(2n)n  ) 


(5.10.10) 


Here  det  B  is  the  determinant  of  the  matrix  B.  If  the  variables  are  not  inde¬ 
pendent,  i.e.,  if  the  covariance  does  not  vanish,  then  the  expression  for  the 
normal  distribution  of  two  variables  is  somewhat  more  complicated. 

Let  us  consider  the  reduced  variables 


X;  Cli 


U;  = 


(7; 


i  =  l,2  , 


and  make  use  of  the  correlation  coefficient 

cov(Xi,x2)  .  . 

P  = - =  cov(u  i ,  u2) 

(71(72 

Equation  (5.10.1)  then  takes  on  the  simple  form 


1 


1 


<P(u\,  u2)  =  kex p(--u  5u)  =  kexp  I  --g(u) 


(5.10.11) 


with 


B  = 


1  (  1  ~P 

i-p2  V  -p  1 


(5.10.12) 


Contours  of  equal  probability  density  are  characterized  by  a  constant  exponent 
in  (5.10.11): 


1  1  ?  ?  1 

— 2  (u1+u2-2uiU2p)  =  -~g( u)  =  const  . 

Z.  P  )  Z 

Let  us  take  for  the  moment  g  (u)  =  1 . 

In  the  original  variables  Eq.  (5.10.13)  becomes 


(5.10.13) 


(xi-ai)2  xi-aix2-a2  (. x2-a2 )2  2 

- o - 2  p - + - = =  1~P 


a 


o\ 


<?2 


Or 


(5.10.14) 
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This  is  the  equation  of  an  ellipse  centered  around  the  point  ( a\,a2 ).  The 
principal  axes  of  the  ellipse  make  an  angle  a  with  respect  to  the  axes  x\ 
and  X2-  This  angle  and  the  half-diameters  p\  and  pi  can  be  determined  from 
Eq.  (5.10.14)  by  using  the  known  properties  of  conic  sections: 


tan  2  a  — 


2pO\G2 

2  2 
a\  ~°2 


a}al(  \  -  pA) 


o'!  cos2  a  —  2po\ 02  sin  a  cos  a  +  crf  sin2  a 
_ a\a\(\-p1') _ 

o'!  sin2  a  +  2po\  02  sin  a  cos  a  +  crj2  cos2  a 


(5.10.15) 

(5.10.16) 

(5.10.17) 


The  ellipse  with  these  properties  is  called  the  covariance  ellipse  of  the 
bivariate  normal  distribution.  Several  such  ellipses  are  depicted  in  Fig.  5.11. 
The  covariance  ellipse  always  lies  inside  a  rectangle  determined  by  the  point 
(a\ ,  02)  and  the  standard  deviations  oq  and  oq.  It  touches  the  rectangle  at  four 
points.  For  the  extreme  cases  p  =  ±1  the  ellipse  becomes  one  of  the  two 
diagonals  of  this  rectangle. 

From  (5.10.14)  it  is  clear  that  other  lines  of  constant  probability  (for 
g  ^  1)  are  also  ellipses,  concentric  and  similar  to  the  covariance  ellipse  and 
situated  inside  (outside)  of  it  for  larger  (smaller)  probability.  The  bivariate 


Oi  =  1.00,  a2  =  0.50 
o1  =  2.00,  o2  =  2.00,  g  =  0.00 


- >  x1 

a-i  =-1.00,  a2  =-1.00 
0-1  =  3.00,  o2  =  2.00,  q  =  0.70 


>  Xl 


=  1.00,  a2  =  0.50 
0-1  =  2.00,  o2  =  2.00,  q  =  -.30 


- >  x1 

a-,  =  2.00,  a2  =  0.00 
o1  =  2.00,  o2  =  3.00,  q  =  0.99 


o  Xl 


Fig.5.11:  Covariance  ellipses. 
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o,  =  2.00,  o2  =  1.00,  e  =  -.50 


X  2 
A 


f>  x 


o<  =  2.00,  o2  =  1.00,  g  =  0.00 


x 


2 


A 


— ^  xi 

o,  =  2.00,  o2  =  1.00,  g  =  0.90 


x 


2 


A 


T>  X 


Fig.  5.12:  Probability  density  of  a  bivariate  Gaussian  distribution  (left)  and  the  corresponding 
covariance  ellipse  (right).  The  three  rows  of  the  figure  differ  only  in  the  numerical  value  of 
the  correlation  coefficient  p. 


normal  distribution  therefore  corresponds  to  a  surface  in  the  three-dimensional 
space  (x\ ,  ,v'2 ,  0)  (Fig.  5.12),  whose  horizontal  sections  are  concentric  ellipses. 
For  the  largest  probability  this  ellipse  collapses  to  the  point  (a\,  ai).  The 
vertical  sections  through  the  center  have  the  form  of  a  Gaussian  distribution 
whose  width  is  directly  proportional  to  the  diameter  of  the  covariance  ellipse 
along  which  the  section  extends.  The  probability  of  observing  a  pair  Xi,  X2  of 
random  variables  inside  the  covariance  ellipse  is  equal  to  the  integral 

I  0(x)dx  =  1  —  e~z  =  const  ,  (5.10.18) 

Ja 
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where  the  region  of  integration  A  is  given  by  the  area  within  the  covari¬ 
ance  ellipse  (5.10.14).  The  relation  (5.10.18)  is  obtained  by  application  of 
the  transformation  of  variables  y  —  Tx  with  T  —  B~'  to  the  distribution  0(x). 
The  resulting  distribution  has  the  properties  cr(yi)  =  cr(y2)  =  1  and 
cov(yi,y2)  =  0,  i.e.,  it  is  of  the  form  of  (5.10.9).  In  this  way  the  region  of 
integration  is  transformed  to  a  unit  circle  centered  about  (a\,  ai). 

In  our  consideration  of  the  normal  distribution  of  measurement  errors  of 
a  variable  we  found  the  interval  a  —  cr  <  x  <a  +  cr  to  be  the  region  in  which 
the  probability  density  f(x )  exceeded  a  given  fraction,  namely  e“ 1  /2  of  its 
maximum  value.  The  integral  over  this  region  was  independent  of  a .  In  the 
case  of  two  variables,  the  role  of  this  region  is  taken  over  by  the  covariance 
ellipse  determined  by  oq,  <72,  and  p,  and  not  -  as  is  sometimes  incorrectly 
assumed  -  by  the  rectangle  that  circumscribes  the  ellipse  in  Fig.  5.11.  The 
meaning  of  the  covariance  ellipse  can  also  be  seen  from  Fig.  5.13.  Points  1 
and  2,  which  lie  on  the  covariance  ellipse,  correspond  to  equal  probabilities 
(P(l)  =  P(2)  =  Pc),  although  the  distance  of  point  1  from  the  middle  is  less  in 
both  coordinate  directions.  In  addition,  point  3  is  more  probable,  and  point  4 
less  probable  (P(4)  <  Pe,  P( 3)  >  Pe),  even  though  point  4  even  closer  is  to 
(a\,  d2)  than  point  3. 


Fig.  5.13:  Relative  probability  for 
various  points  from  a  bivariate  Gau¬ 
ssian  distribution  (Pi  =  Pi  =  Pe, 

P3  >  Pe,  Pa  <  Pe)- 


For  three  variables  one  obtains  instead  of  the  covariance  ellipse  a  co- 
variance  ellipsoid ,  for  n  variables  a  hyperellipsoid  in  an  n-dimensional  space 
(see  also  Sect.  A.l  1).  According  to  our  construction,  the  covariance  ellipsoid 
is  the  hypersurface  in  the  n-dimensional  space  on  which  the  function  g(x) 
in  the  exponent  of  the  normal  distribution  (5.10.1)  has  the  constant  value 
g  (x)  =  1 .  For  other  values  g  (x)  =  const  one  obtains  similar  ellipsoids  which 
lie  inside  (g  <  1)  or  outside  (g  >  1)  of  the  covariance  ellipsoid.  In  Sect.  6.6  it 


100 


5  Some  Important  Distributions  and  Theorems 


will  be  shown  that  the  function  g(x)  follows  a  /^distribution  with  n  degrees 
of  freedom  if  x  follows  the  normal  distribution  (5.10.1).  The  probability  to 
find  x  inside  the  ellipsoid  g  —  const  is  therefore 

w  =  £  f(x2;n)dx2  =  pQ,^j  .  (5.10.19) 

Here  P  is  the  incomplete  gamma  function  given  in  Sect.  D.5.  For  g  =  1,  that 
is,  for  the  covariance  ellipsoid  in  n  dimensions,  this  probability  is 


Numerical  values  for  small  n  are 


IF 1  =0.68269  ,  W2  =  0.39347 
W4  =  0.090  20  ,  W5  =  0.037  34 


(5.10.20) 


W3  =  0.19875 
W6  =  0.01439 


The  probability  decreases  rapidly  as  n  increases.  In  order  to  be  able  to  give 
regions  for  various  n  which  correspond  to  equal  probability  content,  one 
specifies  a  value  W  on  the  left-hand  side  of  (5.10.19)  and  determines  the  cor¬ 
responding  value  of  g.  Then  g  is  the  quantile  with  probability  W  of  the 
/ 2-distribution  with  n  degrees  of  freedom  (see  also  Appendix  C.5), 

g  =  Xw(n)  .  (5.10.21) 

The  ellipsoid  that  corresponds  to  the  value  of  g  that  contains  x  with  the  prob¬ 
ability  W  is  called  the  confidence  ellipsoid  of  probability  W.  This  expres¬ 
sion  can  be  understood  to  mean  that,  e.g.,  for  W  =  0.9  one  should  have  90% 
confidence  that  x  lies  within  the  confidence  ellipsoid. 

The  variances  of  or  the  standard  deviations  A,  =  o,  also  have  a  certain 
meaning  for  n  variables.  The  probability  to  observe  the  variable  x,  in  the 
region  a,  —  Oj  <  Xj  <  a,  4 -cr,-  is,  as  before,  68.3%,  independent  of  the  number 
n  of  the  variables.  This  only  holds,  however,  when  one  places  no  requirements 
on  the  positions  of  any  of  the  other  variables  xj ,  j  i. 


5.11  Convolutions  of  Distributions 

5.11.1  Folding  Integrals 

On  various  occasions  we  have  already  discussed  sums  of  random  variables, 
and  in  the  derivation  of  the  Central  Limit  Theorem,  for  example,  we  found  the 
characteristic  function  to  be  a  useful  tool  in  such  considerations.  We  would 
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now  like  to  discuss  the  distribution  of  the  sum  of  two  quantities,  but  for  greater 
clarity  we  will  not  make  use  of  the  characteristic  function. 

A  sum  of  two  distributions  is  often  observed  in  experiments.  One  could 
be  interested,  for  example,  in  the  angular  distribution  of  secondary  particles 
from  the  decay  of  an  elementary  particle.  This  can  often  be  used  to  determine 
the  spin  of  the  particle.  The  observed  angle  is  the  distribution  of  a  sum  of 
random  quantities,  namely  the  decay  angle  and  its  measurement  error.  One 
speaks  of  the  convolution  of  two  distributions. 


*~x 

Fig.5.14:  Integration  region  for  (  5.1 1.4). 


Let  the  original  quantities  be  x  and  y  and  the  sum 

u  =  x  +  y  .  (5.11.1) 

A  requirement  for  further  treatment  is  that  the  original  variables  must  be  in¬ 
dependent.  In  this  case,  the  joint  probability  density  is  the  product  of  simple 
densities, 

f(x,y)  =  fx(x)fy(y)  .  (5.11.2) 

If  we  now  ask  for  the  distribution  function  of  U,  i.e.,  for 

F{u)  —  P(u  <  u)  —  P(x  +  y  <  u)  ,  (5.11.3) 


then  this  is  obtained  by  integration  of  (5.11.2)  over  the  hatched  region  A  in 
Fig.5.14; 


F(u)  = 


p  p  poo  pu—x 

/  /  fx(x)fy(y)dxdy  =  /  fx(x) dx  /  fy(y)dy 
J  J  A  J — oo  J-oo 


u-y 


/OO  P 

fy (y)  dy  /  fx(x)dx  . 

-OO  J  —  OO 

By  differentiation  one  obtains  the  probability  density  for  u, 


(5.11.4) 


d  F(u)  r0° 


/OO  POO 

fx(x)fy(u-x)  dx=  fy(y)fx(u-y)dy  . 

-oo  J-oo 


(5.11.5) 
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If  x  or  y  or  both  are  only  defined  in  a  restricted  region,  then  (5.1 1.5)  is  still 
true.  The  limits  of  integration,  however,  may  be  limited.  We  will  consider 
various  cases: 

(a)  0  <  x  <  oo ,  —  oo  <  y  <  oo  : 

/(«)=/*  fx(u-y)fy(y)dy  .  (5.11.6) 

J  —  OO 

(Because  y  =  u  —x  and  since  for  jcmin  =  0  one  has  ymax  =  u.) 

(b)  0  <  x  <  oo ,  0  <  y  <  oo  : 

f(u)=  I"  fx(u-y)fy(y)dy  .  (5.11.7) 

Jo 

(c)  a  <  x  <  b ,  — oo  <  y  <  oo  : 

/(«)=/*  fx(x)fy(u-x)dx  .  (5.11.8) 

J  a 

We  will  demonstrate  case  (d)  in  the  following  example,  in  which  both  x  and 
y  are  bounded  from  below  and  from  above. 


Example  5.8:  Convolution  of  uniform  distributions 
With 


1,  0  <  x  <  1 

0  otherwise 


and  fy(y)  = 


l,0<y  <  1 
0  otherwise 


and  Eq.  (5.1 1.8)  we  obtain 

f(u)  =  /  fy(u-x)dx 

Jo 

We  substitute  v  =  u  —x,  dv  =  —  d.r  and  obtain 


f(u)  =  -  f  fy(v)dv=  f  fy(v)dv  . 

Ju  J u— 1 

Clearly  one  has  0  <  u  <2.  We  now  consider  separately  the  two  cases 


(5.11.9) 


(a)  0  <  u  <  1 


f\  (u)=  f  fy(v)dv=  [  dw 

Jo  Jo 


=  u 


(b)  1  <  u  <  2  :  f2(u)  =  I  fy(v)dv  = 


dv  —  2  —  u 

(5.11.10) 
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f(u) 


Fig.  5.15:  Convolution  of  uni¬ 
form  distributions.  Probabil¬ 
ity  density  of  the  sum  u  of 
uniformly  distributed  random 
variables  x  (a)  u  =  x,  ( b ) 
u  =  x  +  x,  (c)  u  =  x  +  x  +  x. 


Note  that  the  lower  (upper)  limit  of  integration  is  not  lower  (higher)  than  the 
value  0  (1).  The  result  is  a  triangular  distribution  (Fig.  5.15). 

If  this  result  is  folded  again  with  a  uniform  distribution,  i.e.,  if  u  is  the 
sum  of  three  independent  uniformly  distributed  variables,  then  one  obtains 


<  ^(— 2u2  +  6u  —  3) , 
.  \iu-  3)2, 


0  <  u  <  1 

1  <  u  <  2 

2  <  u  <  3 


(5.11.11) 


The  proof  is  left  to  the  reader.  The  distribution  consists  of  three  parabolic  sec¬ 
tions  (Fig.  5.15)  and  is  similar  already  to  the  Gaussian  distribution  predicted 
by  the  Central  Limit  Theorem.  ■ 


5.11.2  Convolutions  with  the  Normal  Distribution 

Suppose  a  quantity  of  experimental  interest  x  can  be  considered  to  be  a 
random  variable  with  probability  density  fx(x).  It  is  measured  with  a  mea¬ 
surement  error  y,  which  follows  a  normal  distribution  with  a  mean  of  zero 
and  a  variance  of  a2.  The  result  of  the  measurement  is  then  the  sum 

u  =  x  +  y  .  (5.11.12) 

Its  probability  density  is  [see  also  (5.1 1.4)] 

f(u)  =  /  /X(x)exp[— (u  —  Jc)2/2o'2]dx  .  (5.11.13) 

v2t TO  J  —  oo 

By  carrying  out  many  measurements,  /(«)  can  be  experimentally  determined. 
The  experimenter  is  interested,  however,  in  the  function  fx(x).  Unfortunately, 
Eq.  (5.11.13)  cannot  in  general  be  solved  for  fx(x).  This  is  only  possible 
for  a  restricted  class  of  functions  /(«).  Therefore  one  usually  approaches 
the  problem  in  a  different  way.  From  earlier  measurements  or  theoretical 
considerations  one  possesses  knowledge  about  the  form  of  fx(x),  e.g.,  one 
might  assume  that  fx(x )  is  described  by  a  uniform  distribution,  without, 
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however,  knowing  its  boundaries  a  and  b.  One  then  carries  out  the  convo¬ 
lution  (5.11.13),  compares  the  resulting  function  f(u)  with  the  experiment 
and  in  this  way  determines  the  unknown  parameters  (in  our  example  a  and  b). 

In  many  cases  it  is  not  even  possible  to  perform  the  integration  (5.1 1.13) 
analytically.  Numerical  procedures,  e.g.,  the  Monte  Carlo  method,  then  have 
to  be  used.  Sometimes  approximations  (cf.  Example  5.11)  give  useful  results. 
Because  of  the  importance  in  many  experiments  of  convolution  with  the  nor¬ 
mal  distribution  we  will  study  some  examples. 


Example  5.9:  Convolution  of  uniform  and  normal  distributions 
Using  Eqs.  (3.3.26)  and  (5.1 1.8)  and  substituting  v  =  (x  —  u)/a  we  obtain 


/(«) 


/(«) 


1 


b  —  a 


i  r  7  7 

/  exp[— (u  —  x)  /2cr  ]dx 

llC  (7  J a 


1  1  f 

b  —  a  s/Tjt  J( , 


a 

0 b—u)/a 


i  7 

exp( — v  )av 


(i a—u)/cr 


l 


b  —  a 


b  —  u 
to  -  -  fo 


a 


2 

a  —  u 
a 


(5.11.14) 


The  function  \J/  has  already  been  defined  in  (5.6.5).  Figure  5.16  shows  the  re¬ 
sult  for  a  =  0,  b  =  6,  o  =  1.  If  one  has  \b  —  a  \  o  (as  is  the  case  in  Fig.  5.16), 
one  of  the  terms  in  parentheses  in  (5.11.14)  is  either  0  or  1.  The  rising  edge 
of  the  uniform  distribution  at  u  =  a  is  replaced  by  the  distribution  function 
of  the  normal  distribution  with  standard  deviation  a  (see  also  Fig.  5.7).  The 
falling  edge  at  u  =  b  is  its  “mirror  image”.  ■ 


Fig.5.16:  Convolution  of  a  uniform  and  Gaussian  distribution. 


Example  5.10:  Convolution  of  two  normal  distributions.  “Quadratic 
addition  of  errors” 

If  one  convolutes  two  normal  distributions  with  mean  values  0  and  variances 
a}  and  cr,2,  one  obtains 

x  y 

.fin)  =  J—  exp(— «2/2cr2)  ,  a2  =  o-2  +  er2 


(5.11.15) 
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The  proof  has  already  been  shown  in  Sect.  5.7  with  the  help  of  the  charac¬ 
teristic  function.  It  can  also  be  obtained  by  computation  of  the  folding  inte¬ 
gral  (5.11.5).  If  the  distributions  fx(x )  and  fy(y)  describe  two  independent 
sources  of  measurement  errors,  the  result  (5. 1 1 . 15)  is  known  as  the  “quadratic 
addition  of  errors’’,  m 


Example  5.11:  Convolution  of  exponential  and  normal  distributions 
With 

1 

fx(x)  =  —  exp(— x/r)  ,  x  >  0  , 

X 

fyiy)  =  J-  exp(— y2/2ff2)  , 

\f2ltG 

Eq.  (5.1 1.6)  takes  on  the  following  form: 

1  fu 

f(u)  =  —= —  1  exp[— (w  —  y)/r]exp(— y2/2cr2)dy  . 

Vzjrar  J—oo 

We  can  rewrite  the  exponent 


1 


2cr2r 


[2o2(u  —  y)  +  ry2] 


1 


2ct2t 


2crw  —  2 o2y  +  ry2  + 


a 


1 


u  a 
t  2r2  2a2 


a 


y 


and  obtain 


We  now  require  that  o  r,  i.e.,  that  the  measurement  error  is  much  smaller 
than  the  typical  value  (width)  of  the  exponential  distribution.  In  addition,  we 
only  consider  values  of  u  for  which  u  —  a2 /r  »  a,  i.e.,  u  yy>  a.  The  integral 
is  then  approximately  equal  to  +J2jta  or 


In  a  semi-logarithmic  representation,  i.e.,  in  a  plot  of  In  f(u)  versus  w,  the 
curve  /(«)  lies  above  the  curve  fx  (x ) ,  by  an  amount  cr2/2r2,  since 

1  o2  u  o 2 

ln/(n)  =ln-  +  — r- - =  In  fx  (x)  +  — =■  . 

r  2rz  r  2rz 
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This  is  plotted  in  Fig.  5.17.  The  result  can  be  qualitatively  understood  in  the 
following  way.  For  each  small  x  interval  of  the  exponential  distribution,  the 
convolution  leads  with  equal  probability  to  a  shift  to  the  left  or  to  the  right. 
Since,  however,  the  exponential  distribution  for  a  given  u  is  greater  for  small 
values  of  x,  contributions  to  the  convolution  f{u)  originate  with  greater  prob¬ 
ability  from  the  left  than  from  the  right.  This  leads  to  an  overall  shift  to  the 
right  of  f(u)  with  respect  to  fx(x).u 


n 


-r*-  X,U 

5 


Fig.  5.17:  Convolution  of  exponential  and  normal  distributions. 


5.12  Example  Programs 

Example  Program  5.1:  Class  ElDistrib  to  simulate  empirical 
frequency  and  demonstrate  statistical  fluctuations 

The  program  simulates  the  problem  of  Example  5.1.  It  allows  input  of  values  for  nex p, 
n Ay,  and  P(A),  and  then  consecutively  performs  nQX p  simulated  experiments.  In  each 
experiment  n^y  objects  are  analyzed.  Each  object  has  a  probability  P(A)  to  have  the 
property  A.  For  each  experiment  one  line  of  output  is  produced  containing  the  current 
number  /exp  of  the  experiment,  the  number  N A  of  objects  with  the  property  A  and  the 
frequency  hA  =  NA/n^y  with  which  the  property  A  was  found.  The  fluctuation  of 
hA  around  the  known  input  value  P(A)  in  the  individual  experiments  gives  a  good 
impression  of  the  statistical  error  of  an  experiment. 
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Example  Program  5.2:  Class  E2Distrib  to  simulate  the  experiment 
of  Rutherford  and  Geiger 

The  principle  of  the  experiment  of  Rutherford  and  Geiger  is  described  in 
Example  5.3.  It  is  simulated  as  follows.  Input  quantities  are  the  number  N  of  de¬ 
cays  observed  and  the  number  nmi  of  partial  intervals  AT  of  the  total  observation 
time  T.  For  simplicity  the  length  of  each  partial  interval  is  set  equal  to  one.  A  total  of 
N  random  events  are  simulated  by  simply  generating  N  random  numbers  uniformly 
distributed  between  0  and  T.  They  are  entered  into  a  histogram  with  nm{  intervals. 
The  histogram  is  first  displayed  graphically  and  then  analyzed  numerically.  For  each 
number  k  =  0, 1, ... ,  N[n{  the  program  determines  how  many  intervals  N(k )  of  the 
histogram  have  k  entries.  The  numbers  N(k)  themselves  are  presented  in  the  form  of 
another  histogram. 

Show  that  for  the  process  simulated  in  this  example  program  one  obtains  in  the 
limit  N  ->  oo 

N(k)  =  nmtWk(p  =  1/  nint)  . 

If  N  is  increased  step  by  step  and  at  the  same  time  X  =  Np  =  N /n-mi  is  kept  constant, 
then  for  large  N  one  has 

WkN(p  =  X/N)^^&-k 

and,  in  the  limit  N  ->  oo, 

N(k)  =  n-mx— e_A  . 
k\ 

Check  the  above  statements  by  running  the  program  with  suitable  pairs  of  numbers, 
e.g.,  (N,  /lint)  =  (4, 2),  (40, 20),  . . . ,  (2000, 1000),  by  reading  the  numbers  N(k )  from 
the  graphics  display  and  by  comparing  them  with  the  statements  above. 

Example  Program  5.3:  Class  E3Distrib  to  simulate  Gabon’s  board 

Gabon’s  board  is  a  simple  implementation  of  Laplace’s  model  described  in 
Example  5.7.  The  vertical  board  contains  rows  of  horizontally  oriented  nails  as  shown 
in  Fig.  5.18.  The  rows  of  nails  are  labeled  j  =  1, 2, . . . ,  ft,  and  row  j  has  j  nails.  One 
by  one  a  total  of  7Vexp  balls  fall  onto  the  nail  in  row  1 .  There  each  ball  is  deflected 
with  probability  p  to  the  right  and  with  probability  (1  —  p)  to  the  left.  (In  a  realistic 
board  one  has  p  =  1/2.)  The  distance  between  the  nails  is  chosen  in  such  a  way  that 
in  each  case  the  ball  hits  one  of  the  two  nails  in  row  2  and  there  again  it  is  deflected 
with  the  probability  p  to  the  right.  After  falling  through  n  rows  each  ball  assumes 
one  of  ft  +  1  places,  which  we  denote  by  k  =  0  (on  the  left),  k  =  1,  . . .,  k  =  n  (on 
the  right).  After  a  total  of  Nex p  experiments  (i.e.,  balls)  one  finds  N(k )  balls  for  each 
value  k. 

The  program  allows  input  of  numerical  values  for  Aexp,  n,  and  p.  For  each 
experiment  the  number  k  is  first  set  to  zero  and  n  random  numbers  rj  are  gener¬ 
ated  from  a  uniform  distribution  and  analyzed.  For  each  rj  <  p  (corresponding  to  a 
deflection  to  the  right  in  row  j)  the  number  k  is  increased  by  1.  For  each  experiment 
the  value  of  k  is  entered  into  a  histogram.  After  all  experiments  are  simulated  the 
histogram  is  displayed. 
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Show  that  in  the  limit  N  ^  oo  one  has 

N(k)  =  N  W£(p)  . 

By  choosing  and  entering  suitable  pairs  of  numbers  ( n,p ),  e.g.,  ( n,p )  =  (1,0.5), 
(2, 0.25),  (10, 0.05),  (100, 0.005)  approximate  the  Poisson  limit 

^ k 

N(k)  =  N—e~x  ,  k  =  np  , 
k\ 


and  compare  these  predictions  with  the  results  of  your  simulations. 


6.  Samples 


In  the  last  chapter  we  discussed  a  number  of  distributions,  but  we  have  not 
specified  how  they  are  realized  in  a  particular  case.  We  have  only  given  the 
probability  that  a  random  variable  x  will  lie  within  an  interval  with  boundaries 
x  and  v  +  cbc .  This  probability  depends  on  certain  parameters  describing  its 
distribution  (like  X  in  the  case  of  the  Poisson  distribution)  which  are  usually 
unknown.  We  therefore  have  no  direct  knowledge  of  the  probability  distribu¬ 
tion  and  have  to  approximate  it  by  a  frequency  distribution  obtained  experi¬ 
mentally.  The  number  of  measurements  performed  for  this  purpose,  called  a 
sample,  is  necessarily  finite.  To  discuss  the  elements  of  sampling  theory  we 
first  have  to  introduce  a  number  of  new  definitions. 

6.1  Random  Samples.  Distribution 
of  a  Sample.  Estimators 

Every  sample  is  taken  from  a  set  of  elements  which  correspond  to  the  possible 
results  of  an  individual  observation.  Such  a  set,  which  usually  has  infinitely 
many  elements,  is  called  a  population.  If  a  sample  of  n  elements  is  taken 
from  it,  then  we  say  that  the  sample  has  the  size  n.  Let  the  distribution  of  the 
random  variable  x  in  the  population  be  given  by  the  probability  density  fix). 
We  are  interested  in  the  values  of  x  assumed  by  the  individual  elements  of 
the  sample.  Suppose  that  we  take  t  samples  of  size  n  and  find  the  following 
values  for  x: 

1st  sample:  xp^1  ,  ...,x„, 
j th  sample:  Xj  ,  xf  •  •  •  •  •  x„  , 
l th  sample:  x^ ,  x^ ' , . . . ,  xf  ’ . 
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We  group  the  result  of  one  sample  into  an  n-dimensional  vector 

X°’}  =  (x(/}  x|!'/))  ,  (6.1.1) 

which  can  be  considered  the  position  vector  in  an  n-dimensional  sample  space 
(Sect.  2.1).  Its  probability  density  is 

g(x)=g(x  i,X2,...,xn)  .  (6.1.2) 

This  function  must  fulfill  two  conditions  in  order  for  the  sample  to  be  random : 

(a)  The  individual  x,  must  be  independent,  e.g.,  one  must  have 

g(x)  =  gi(xi)g2(x2)...gn(xn)  •  (6.1.3) 

(b)  The  individual  marginal  distributions  must  be  identical  and  equal  to  the 
probability  density  f(x )  of  the  population, 

gtOO  =  g2(x)  =  •••  =gn(x)  =  f(x)  .  (6.1.4) 

Comparing  with  (6. 1 .2)  it  is  clear  that  there  is  a  simple  relation  between 
a  population  and  a  sample  only  if  these  conditions  are  fulfilled.  In  the 
following  we  will  mean  by  the  word  sample  a  random  sample  unless 
otherwise  stated. 

It  should  be  emphasized  that  in  the  actual  process  of  sampling  it  is  often 
quite  difficult  to  ensure  randomness.  Because  of  the  large  variety  of  applica¬ 
tions,  a  general  prescription  cannot  be  given.  In  order  to  obtain  reliable  results 
from  sampling,  we  have  to  take  the  utmost  precautions  to  meet  the  require¬ 
ments  (6.1.3)  and  (6.1.4).  Independence  (6.1.3)  can  be  checked  to  a  certain 
extent  by  comparing  the  frequency  distributions  of  the  first,  second,  . . .  ele¬ 
ments  of  a  large  number  of  samples.  It  is  very  difficult,  however,  to  ensure  that 
the  samples  in  fact  come  from  a  population  with  the  probability  density  f(x). 
If  the  elements  of  the  population  can  be  numbered,  it  is  often  useful  to  use 
random  numbers  to  select  the  elements  for  the  sample. 

We  now  suppose  that  the  n  elements  of  a  sample  are  ordered  according 
to  the  value  of  the  variable,  e.g.,  marked  on  the  x  axis,  and  we  ask  for  the 
number  of  elements  of  the  sample  nx  for  which  x  <  x,  for  arbitrary  x.  The 
function 

Wn(x)  —  nx/n  (6.1.5) 

takes  on  the  role  of  an  empirical  distribution  function.  It  is  a  step  function  that 
increases  by  \/n  as  soon  as  x  is  equal  to  one  of  the  values  x  of  an  element 
of  the  sample.  It  is  called  the  sample  distribution  function.  It  is  clearly  an 
approximation  for  F(x),  the  distribution  function  of  the  population,  which  it 
approaches  in  the  limit  n  — >  oo. 


6.2  Samples  from  Continuous  Populations 
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A  function  of  the  elements  of  a  sample  (6. 1 . 1 )  is  called  a  statistic.  Since  x 
is  a  random  variable,  a  statistic  is  itself  a  random  variable.  The  most  important 
example  is  the  sample  mean. 


_  1 

x=  -(xi+x2  +  ...+x„)  .  (6.1.6) 

n 

A  typical  problem  of  data  analysis  is  the  following.  The  general  mathe¬ 
matical  form  of  the  probability  density  of  the  population  is  known. 
In  radioactive  decay,  for  example,  the  number  of  nuclei  which  decay  before 
the  time  t  =  r  is  Nz  —  Nq{\  —  exp(— At)),  if  Nq  nuclei  existed  at  time  t  —  0. 
Here,  however,  the  decay  constant  A  is,  in  general,  not  known.  By  taking  a 
finite  sample  (measuring  a  finite  number  of  decay  times  of  individual  nuclei) 
we  want  to  determine  the  parameter  A  as  accurately  as  possible.  Since  such  a 
task  cannot  be  exactly  solved,  because  the  sample  is  finite,  one  speaks  of  the 
estimation  of  parameters.  To  estimate  a  parameter  A  of  a  distribution  function 
one  uses  an  estimator 


S  =  S(Xi,x2,...,x„)  .  (6.1.7) 

An  estimator  is  said  to  be  unbiased  if  for  arbitrary  sample  size  the  expectation 
value  of  the  (random)  quantity  S  is  equal  to  the  parameter  to  be  estimated: 

£{S(Xi,x2,  . . .  ,x„)}  =  A  for  all  n.  (6.1.8) 

An  estimator  is  said  to  be  consistent  if  its  variance  vanishes  for  arbitrarily 
large  sample  size,  i.e.,  if 

lim  o  (S)  =  0  .  (6.1.9) 

n^oo 

Often  one  can  give  a  lower  limit  for  the  variance  of  an  estimator  of  a  parame¬ 
ter.  If  one  finds  an  estimator  So  whose  variance  is  equal  to  this  limit,  then  one 
apparently  has  the  “best  possible”  estimator.  So  is  then  said  to  be  an  efficient 
estimator  for  A. 


6.2  Samples  from  Continuous  Populations: 

Mean  and  Variance  of  a  Sample 

The  case  of  greatest  interest  in  applications  concerns  a  sample  from  an  inf¬ 
initely  large  continuous  population  described  by  the  probability  density  f(x). 
The  sample  mean  (6.1.6)  is  a  random  variable,  as  are  all  statistics.  Let  us 
consider  its  expectation  value 

1 

E  (x)  —  — {£,(Xj)  +  £(x2)  + . . .  +  E(xt j)}  —  x 
n 


(6.2.1) 
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This  expectation  value  is  equal  to  the  expectation  value  of  x.  Since  Eq.  (6.2.1) 
holds  for  all  values  of  n,  the  arithmetic  mean  of  a  sample  is,  as  one  would 
expect,  an  unbiased  estimator  for  the  mean  of  the  population.  The  character¬ 
istic  function  of  the  random  variable  x  is 


1  ,,1" 

( f  \ 

p)x(0  =  • 

<Py.  1  —  I 

l  n  ) 

\nj  J 

n 


(6.2.2) 


Next  we  are  interested  in  the  variance  of  x, 


a2(x) 


£{(x-£(x))2}  =  E 


Xi  +  x2  + . . .  +  x 


n 


—  X 


n 


1 


n 


^{[(Xi  —  x)  +  (X2  —  +  (xn  —  x)]  } 


Since  all  of  the  X;  are  independent,  all  of  the  cross  terms  of  the  type 
E{(Xt  —x)(Xj  —  x)}  ,  i  ^  j  (i.e.,  all  of  the  covariances)  vanish,  and  we  obtain 


(J2(x)  =  -cr2(x) 

n 


(6.2.3) 


One  thus  shows  that  x  is  a  consistent  estimator  for  The  variance  (6.2.3)  is 
itself,  however,  not  a  random  variable,  and  is  therefore  not  directly  obtainable 
by  experiment.  As  a  definition  for  the  sample  variance  we  could  try  using  the 
arithmetic  mean  of  squared  differences 

s'2  =  — {(Xi  -x)2  +  (x2-x)2  +  ...  +  (xm-x)2}  .  (6.2.4) 

n 

Its  expectation  value  is 


E{  s'2) 


E(s'2) 


1 

—  E 
n 


1 

-E 

n 


1 

-E 

n 

n 


n 


y>-x)2 


.  i= 1 
n 


7>  —X  +x  -x)2 


.  i—\ 
n 


n 


n 


y](x(-  -  x)2 + y](i  -  x)2 + 2  y](x;-  -iia-  -  x) 


.  i—\ 


i  —  \ 


i  —  \ 


=  -^{E((x;-v)2)-E((x-x)2)} 


n 


1 


i  =  \ 


m t2(x)  —  n  (  —  <t2(x) 


n 

n  —  1 


1 

n 


n 


<r  (X)  . 


(6.2.5) 
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Hence  one  sees  that  the  sample  variance  defined  in  this  way  is  a  biased 
estimator  for  the  population  variance,  having  an  expectation  value  smaller 
than  er2(x).  We  can  see  directly  from  (6.2.5),  however,  the  size  of  the  bias. 
We  therefore  change  our  definition  (6.2.4)  and  write  for  the  sample  variance 

s2  =  — —  {(xi  -x)2  +  (x2-x)2  +  ...  +  (x„  -x)2}  .  (6.2.6) 

n  —  1 


This  is  now  an  unbiased  estimator  for  cr2(x).  The  value  (n  —  1)  in  the  denom¬ 
inator  appears  at  first  to  be  somewhat  strange.  One  must  consider,  however, 
that  for  n  =  1  the  sample  mean  is  equal  to  the  value  x  of  the  sole  element 
of  the  sample  (x  =  x)  and  that  therefore  the  quantity  (6.2.4)  would  vanish. 
That  is  related  to  the  fact  that  in  (6.2.4)  -  and  also  in  (6.2.6)  -  the  sample 
mean  x  was  used  instead  of  the  population  mean  x,  since  the  latter  was  not 
known.  Part  of  the  information  contained  in  the  sample  first  had  to  be  used 
and  was  not  available  for  the  calculation  of  the  variance.  The  effective  number 
of  elements  available  for  calculating  the  variance  is  therefore  reduced.  This  is 
taken  into  consideration  by  reducing  the  denominator  of  the  arithmetic  mean 
(6.2.4).  The  same  line  of  reasoning  is  repeated  quantitatively  in  Sect.  6.5. 

If  we  substitute  the  estimator  for  the  population  variance  (6.2.6)  into 
(6.2.3),  we  obtain  an  estimator  for  the  variance  of  the  mean 


1  ? 

-s2(x)  = 

n 


1 


n 


n{n  —  1) 


I>*  - 


x)2 


i— 1 


(6.2.7) 


The  corresponding  standard  deviation  can  be  considered  to  be  the  error  of  the 
mean 

Ax  —  \J  s2  (x)  =  s(x)  —  — ^s(x)  .  (6.2.8) 

s/n 

Of  course  we  are  also  interested  in  the  error  of  the  sample  variance 
(6.2.6).  In  Sect.  6.6  we  will  show  that  this  quantity  can  be  determined  un¬ 
der  the  assumption  that  the  population  follows  a  normal  distribution.  We  will 
use  the  result  here  ahead  of  time.  The  variance  of  S2  is 


var(s2)  = 


n—  1 


2 

2{n  —  1) 


(6.2.9) 


If  we  substitute  the  estimator  (6.2.6)  into  the  right-hand  side  for  a2  and  take 
the  square  root,  we  obtain  for  the  error  of  the  sample  variance 


Z\s2  -  s2 


2 


(n-  1) 


(6.2.10) 


Finally  we  give  explicit  expressions  for  estimators  of  the  sample  standard  dev¬ 
iation  and  its  error.  The  first  is  simply  the  square  root  of  the  sample  variance 
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s  =  Vs2  =  -—L=  lj'(Xi  -  X)2  .  (6.2.11) 

v«  -  i  V  ' 

The  error  of  the  sample  standard  deviation  is  obtained  from  (6.2.10)  by  error 
propagation,  which  gives 

Aa=  -  = -  .  (6.2.12) 

V2(n-  1) 


Example  6.1:  Computation  of  the  sample  mean  and  variance  from  data 

Suppose  one  has  n  =  7  measurements  of  a  certain  quantity  (e.g.,  the  length  of 
an  object).  Their  values  are  10.5,  10.9,  9.2,  9.8,  9.0,  10.4,  10.7.  The  computa¬ 
tion  is  made  easier  if  one  uses  the  fact  that  all  of  the  measured  values  are  near 
a  =  10,  i.e.,  they  are  of  the  form  x,  =  a  +  8, .  The  relation  (6.1.6)  then  gives 


with 


1  "  1 

—  /  (a  +  )  =  a  -\ — 

n  L — '  n 

i— 1 


n 

y  '  $i  —  Cl  -|-  A 

i= 1 


1 

-  (0.5  +  0.9  -  0.8  -  0.2  -  1 .0  +  0.4  +  0.7) 


0.5/7  =  0.07 


We  thus  have  x  =  10  +  A  =  10.07. 

The  sample  variance  is  computed  according  to  (6.2.6)  to  be 


S 


2 


1 


n—  1 


I>-*)2 

i—\ 


l 


n  —  1 


n 

T>?-2X/x  +  x2) 
/  =  1 


The  result  can  be  obtained  either  by  the  first  or  last  line  of  the  relation  above. 
In  the  last  line  only  one  difference  is  computed,  not  n.  The  numbers  to  be 
squared,  however,  are  usually  considerably  larger,  and  one  must  consider  the 
problem  of  rounding  errors.  We  therefore  use  the  original  expression 

s2  =  -{0.432  +  0.832  +  0.872  +  0.272  +  1 .072  +  0.332  +  0.632} 

6 

1 

=  -{0.1 849  +  0.6889  +  0.7569  +  0.0729  +  1 . 1449  +  0. 1089 
6 

+  0.3969} 

=  3.3543/6  ss  0.56  . 
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The  sample  standard  deviation  is  S  ^  0.75.  From  (6.2.8),  (6.2. 10),  and  (6.2. 12) 
we  obtain  finally  Ax  —  0.28,  As 2  —  0.32,  and  As  —  0.21.  ■ 

Naturally  one  does  not  usually  compute  the  sample  mean  and  variance  by 
hand,  but  rather  by  the  class  Sample  and  its  methods. 

6.3  Graphical  Representation  of  Samples: 

Histograms  and  Scatter  Plots 

After  the  theoretical  considerations  of  the  last  sections  we  now  turn  to  some 
simple  practical  aspects  of  the  analysis  of  sample  data.  An  important  tool  for 
this  is  the  representation  of  the  data  in  graphical  form. 

A  sample 

'M  5  *2 '  •  •  •  )  , 

which  depends  on  a  single  variable  x  can  be  represented  simply  by  means  of 
tick  marks  on  an  x  axis.  We  will  call  such  a  representation  a  one -dimensional 


Table6.1:  Values  of  resistance  R  of  100  individual  resistors  of  nominal  value  200  .  The 

data  are  graphically  represented  in  Fig.  6. 1 . 


193.199 

195.673 

195.757 

196.051 

196.092 

196.596 

196.679 

196.763 

196.847 

197.267 

197.392 

197.477 

198.189 

198.650 

198.944 

199.070 

199.111 

199.153 

199.237 

199.698 

199.572 

199.614 

199.824 

199.908 

200.118 

200.160 

200.243 

200.285 

200.453 

200.704 

200.746 

200.830 

200.872 

200.914 

200.956 

200.998 

200.998 

201.123 

201.208 

201.333 

201.375 

201.543 

201.543 

201.584 

201.711 

201.878 

201.919 

202.004 

202.004 

202.088 

202.172 

202.172 

202.297 

202.339 

202.381 

202.507 

202.591 

202.633 

202.716 

202.884 

203.051 

203.052 

203.094 

203.094 

203.177 

203.178 

203.219 

203.764 

203.765 

203.848 

203.890 

203.974 

204.184 

204.267 

204.352 

204.352 

204.729 

205.106 

205.148 

205.231 

205.357 

205.400 

205.483 

206.070 

206.112 

206.154 

206.155 

206.615 

206.657 

206.993 

207.243 

207.621 

208.124 

208.375 

208.502 

208.628 

208.670 

208.711 

210.012 

211.394 
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scatter  plot.  It  contains  all  the  information  about  the  sample.  Table  6. 1  con¬ 
tains  the  values  Xi,  X2, . . . ,  x„  of  a  sample  of  size  100,  obtained  from  mea¬ 
suring  the  resistance  R  of  100  individual  resistors  of  nominal  value  200  kf2. 
After  obtaining  the  sample  the  measurements  were  ordered. 


Fig.  6.1:  Representation  of  the  data  from  Table  6.1  as  (a)  a  one-dimensional  scatter  plot,  (b) 
a  bar  diagram,  (c)  a  step  diagram,  and  (d)  a  diagram  of  measured  points  with  error  bars. 


Figure  6.1a  shows  the  corresponding  scatter  plot.  Qualitatively  one  can 
estimate  the  mean  and  variance  from  the  position  and  width  of  the  clustering 
of  tick  marks. 

Another  graphical  representation  is  usually  better  suited  to  visualize  the 
sample  by  using  the  second  dimension  available  on  the  paper.  The  x  axis  is 
used  as  abscissa  and  divided  into  r  intervals 


£l,  £2,  •  •  •  >  tr 


of  equal  width  Ax.  These  intervals  are  called  bins.  The  centers  of  the  bins 
have  the  x  -values 

y  ^  y  •  •  •  y  Y*  • 

On  the  vertical  axis  one  plots  the  corresponding  numbers  of  sample  elements 


n \,U2,  ...,nr 
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that  fall  into  bins  £1 ,  £2  , . . . ,  •  The  diagram  obtained  in  this  way  is  called  a 

histogram  of  the  sample.  This  can  be  interpreted  as  a  frequency  distribution, 
since  hk  =  njjn  is  a  frequency,  i.e.,  a  measure  of  the  probability  pk  to  observe 
a  sample  element  in  the  interval  i~k.  For  the  graphical  form  of  histograms, 
various  methods  are  used.  In  a  bar  diagram  the  values  n  k  are  represented  as 
bars  perpendicular  to  the  x  axis  on  the  Xk  values  (Fig.  6.1b).  In  a  step  diagram 
the  rik  are  represented  as  horizontal  lines  that  cover  the  entire  width  £7  of  the 
interval.  Neighboring  horizontal  lines  are  connected  by  perpendicular  lines 
(Fig.  6.1c).  The  fraction  of  the  area  covering  each  interval  of  the  x  axis 
is  then  proportional  to  the  number  n *  of  the  sample  elements  in  the  interval. 
(If  one  uses  the  area  in  the  interval  for  the  graphical  representation  of  ilk,  then 
the  bins  can  also  have  different  widths.)  In  economics  bar  diagrams  are  most 
commonly  used.  (Sometimes  one  also  sees  diagrams  in  which,  instead  of  bars, 
line  segments  are  used  to  connect  the  tips  of  the  bars.  In  contrast  to  the  step 
diagram,  the  resulting  figure  does  not  have  an  area  proportional  to  the  sample 
size  n.)  In  the  natural  sciences,  step  diagrams  are  more  common. 

In  Sect.  6.8  we  will  determine  that  as  long  as  the  values  n *  are  not  too 
small,  their  statistical  errors  are  given  by  Arik  =  «Jnk-  In  order  to  plot  them 
on  a  graph,  the  observed  values  n*  can  be  drawn  as  points  with  vertical  error 
bars  ending  at  the  points  n k  ±  ^/nj.  (Fig.  6.  Id). 

It  is  clear  that  the  relative  errors  Ank/nk  =  1  /*JKk  decrease  for  increas¬ 
ing  nk,  i.e.,  for  a  sample  of  fixed  size  n  they  decrease  for  increasing  bin  width 
of  the  histogram.  On  the  other  hand,  by  choosing  a  larger  interval  width,  one 
loses  any  finer  structure  of  the  data  with  respect  to  the  variable  x.  The  ability 
of  a  histogram  to  convey  information  therefore  depends  crucially  on  the  ap¬ 
propriate  choice  of  the  bin  width,  usually  found  only  after  several  attempts. 


Example  6.2:  Histograms  of  the  same  sample  with  various  choices 
of  bin  width 

In  Fig.  6.2  four  histograms  of  the  same  sample  are  shown.  The  population 
is  a  Gaussian  distribution,  which  is  also  shown  as  a  continuous  curve.  This 
was  scaled  in  such  a  way  that  the  area  under  the  histogram  is  equal  to  the 
area  under  the  Gaussian  curve.  Although  the  information  contained  in  the 
plot  is  greater  for  a  smaller  bin  width  -  for  vanishing  bin  width  the  histogram 
becomes  a  one  dimensional  scatter  plot  -  one  notices  the  similarity  between 
the  histogram  and  Gaussian  curve  much  more  easily  for  the  larger  bin  width. 
This  is  because  for  the  larger  bin  width  the  relative  statistical  fluctuations  of 
the  contents  of  individual  bins  are  smaller.  The  individual  steps  of  the  his¬ 
togram  differ  less  from  the  curve.  ■ 

Constructing  a  histogram  from  a  sample  is  a  simple  programming  task. 
Suppose  the  histogram  has  nx  bins  of  width  Ax,  with  the  first  interval  extend¬ 
ing  from  jc  =  xq  to  x  =  xq  +  Ax .  The  contents  of  the  histogram  is  put  into  an 
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Fig.  6.2:  Histogram  of  the  same  sample  from  a  Gaussian  distribution  represented  with  four 
different  bin  widths. 


array  hist,  with  the  first  bin  in  hist[0]  the  second  bin  in  hist[l],  etc. 

The  histogram  is  then  specified  in  the  computer  by  the  array  hist  and  the 
three  values  xo ,  Ax,  and  nx.  The  class  Histogram  permits  construction  and 
administration  of  a  histogram. 

Graphical  display  of  histograms  can  be  accomplished  using  methods  of 
the  class  DatanGraphics  (Appendix  F).  With  them  the  user  can  freely 
adjust  all  of  the  parameters  that  determine  the  appearance  of  the  plot,  such  as 
the  page  format,  scale  factors,  colors,  line  thickness,  etc.  Often  it  is  convenient 
to  use  the  class  GraphicsWithHistogram  which  does  not  allow  for  this 
freedom  but  by  a  single  call  gives  rise  to  the  graphical  output  of  a  histogram, 
stored  in  the  computer. 

A  histogram  allows  a  first  direct  look  at  the  nature  of  the  data.  It  ans¬ 
wers  questions  such  as  “Are  the  data  more  or  less  distributed  according  to  a 
Gaussian?”  or  “Do  there  exist  points  exceptionally  far  away  from  the  average 
value?”.  If  the  histogram  leads  one  to  conclude  that  the  population  is  dis¬ 
tributed  according  to  a  Gaussian,  then  the  mean  and  standard  deviation  can 
be  estimated  directly  from  the  plot.  The  mean  is  the  center  of  gravity  of  the 
histogram.  The  standard  deviation  is  obtained  as  in  the  following  example. 
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Example  6.3:  Full  width  at  half  maximum  (FWHM) 

If  the  form  of  histogram  allows  one  to  assume  that  the  represented  sample 
originates  from  a  Gaussian  distribution,  then  one  can  draw  by  hand  a  Gaussian 
bell  curve  that  follows  the  histogram  as  closely  as  possible.  The  position  of 
the  maximum  is  a  good  estimator  for  the  mean  of  the  sample.  One  then  draws 
a  horizontal  line  at  half  of  the  height  of  the  maximum.  This  crosses  the  bell 
curve  at  the  points  xa  and  x/, .  The  quantity 


is  called  the/w//  width  at  half  maximum  (FWHM).  One  can  easily  determine 
for  a  Gaussian  distribution  the  simple  relation 


/ 


0.4247  / 


(6.3.1) 


cr  = 


between  the  standard  deviation  and  FWHM.  This  expression  can  be  used 
to  estimate  the  standard  deviation  of  a  sample  when  /  is  obtained  from  a 
histogram.  ■ 

We  now  use  the  Monte  Carlo  method  (Chap.  4)  together  with  histograms 
in  order  to  illustrate  the  concepts  of  mean,  standard  deviation,  and  variance  of 
a  sample,  and  their  errors,  as  introduced  in  Sect.  6.2. 

Example  6.4:  Investigation  of  characteristic  quantities  of  samples  from  a 
Gaussian  distribution  with  the  Monte  Carlo  method 

We  generate  successively  1000  samples  of  size  N  =  100  from  the  standard 
normal  distribution,  e.g.,  compute  the  mean  x,  variance  S2,  and  standard 
deviation  S  for  each  sample  as  well  as  the  errors  Ax,  As2,  and  As,  with  the 
methods  of  the  classSample.  We  then  produce  for  each  of  the  six  quantities 
a  histogram  (Fig.  6.3),  containing  1000  entries.  Since  each  of  the  quantities 
is  defined  as  the  sum  of  many  random  quantities,  we  expect  in  all  cases  that 
the  histograms  should  resemble  Gaussian  distributions.  From  the  histogram 
for  x,  we  obtain  a  full  width  at  half  maximum  of  about  0.25,  and  hence  a 
standard  deviation  of  approximately  0.1.  Indeed  the  histogram  for  Ax  shows 
an  approximately  Gaussian  distribution  with  mean  value  Ax  =  0.1.  (From 
the  width  of  this  histogram  one  could  determine  the  error  of  the  error  Ax 
of  the  mean  x!)  From  both  histograms  one  obtains  a  very  clear  impression 
of  the  meaning  of  the  error  Ax  of  the  mean  x  for  a  single  sample,  as  com¬ 
puted  according  to  (6.2.8).  It  gives  (within  its  error)  the  standard  deviation  of 
the  population,  from  which  the  sample  mean  x  comes.  If  many  samples  are 
successively  taken  (i.e.,  if  the  experiment  is  repeated  many  times)  then  the 
frequency  distribution  of  the  values  x  follows  a  Gaussian  distribution  about 
the  population  mean  with  standard  deviation  Ax.  The  corresponding  consid¬ 
erations  also  hold  for  the  quantities  S2,  s,  and  their  errors  Z\s  and  Z\s2.  ■ 
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Fig. 6.3:  Histograms  of  the  quantities  x,  Ax,  s,  As,  s2,  and  zis2  from  1000  samples  of  size 
100  from  the  standard  normal  distribution. 


If  the  elements  of  the  sample  depend  on  two  random  variables  x  and 
y,  then  one  can  construct  a  scatter  plot,  where  each  element  is  represented 
as  a  point  in  a  Cartesian  coordinate  system  for  the  variables  x  and  y.  Such 
a  two-dimensional  scatter  plot  provides  useful  qualitative  information  about 
the  relationship  between  the  two  variables. 

The  class  GraphicsWith2DScatterDiagram  generates  such  a 
diagram  by  a  single  call.  (A  plot  in  the  format  A5  landscape  is  generated, 
into  which  the  scatter  diagram,  itself  in  square  format,  is  fitted.  If  another  plot 
format  or  edge  ratio  of  the  diagram  is  desired,  the  class  has  to  be  adapted 
accordingly.) 


6.3  Graphical  Representation  of  Samples 


121 


Example  6.5:  Two-dimensional  scatter  plot:  Dividend  versus  price  for 
industrial  stocks 

Table  6.2  contains  a  list  of  the  first  10  of  226  data  sets,  in  which  the  divi¬ 
dend  in  1967  (first  column)  and  share  price  on  December  31,  1967  (second 
column)  are  given  for  a  number  of  industrial  stocks.  The  third  column  shows 
the  company  name  for  all  German  corporations  worth  more  than  10  million 
marks.  The  scatter  plot  of  the  number  pairs  (share  price,  dividend)  is  shown 
in  Fig.  6.4. 
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Fig.  6.4  :  Scatter  plot  of  dividend  D  versus  share  price  P  for  industrial  stocks. 


As  expected  we  see  a  strong  correlation  between  dividend  and  share 
price.  One  can  see,  however,  that  the  dividend  does  not  grow  linearly  with 
the  price.  It  appears  that  factors  other  than  immediate  profit  determine  the 
price  of  a  stock. 

Also  shown  are  histograms  for  the  share  price  (Fig.  6.5)  and  dividend 
(Fig.  6.6)  which  can  be  obtained  as  projections  of  the  scatter  plot  onto  the 
abscissa  and  ordinate.  One  clearly  observes  a  non-statistical  behavior  for  the 
dividends.  It  is  given  as  a  percent  of  the  nominal  value  and  is  therefore  almost 
always  integer.  One  sees  that  even  numbers  are  considerably  more  frequent 
than  odd  numbers.  ■ 
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Table 6.2:  Dividend,  share  price  of  a  stock,  and  company  name. 


12. 

133. 

ACKERMANN-GOEGGINGEN 

o 

OO 

• 

417. 

ADLERWERKE  KLEYER 

17. 

346. 

AGROB  AG  FUER  GROB  U.  FEINKERAMIK 

25. 

765. 

AG .  F.  ENERGIE  WIRT  S  CH  AFT 

16. 

355. 

AG  F.  LICHT-  U.  KRAFTVERS .  ,MCHN. 

20. 

315. 

AG.F.  IND.U.VERKEHRSW. 

o 

OO 

• 

138. 

AG.  WESER 

16. 

295. 

AEG  ALLG.ELEKTR.-GES. 

20. 

479. 

ANDREAE-NORIS  ZAHN 

10. 

201. 

ANKERWERKE 

6.4  Samples  from  Partitioned  Populations 

It  is  often  advantageous  to  divide  a  population  G  (e.g.,  all  of  the  students  in 
Europe)  into  various  subpopulations  G\,  Gi,  Gt  (students  at  university 
1,2 Suppose  a  quantity  of  interest  x  follows  in  the  various  subpopula¬ 
tions  the  probability  densities  /i(x),  /2(x ),...,  /f(x).  The  distribution  func¬ 
tion  corresponding  to  (x)  is  then 
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o  D 


Fig.  6.6:  Histogram  of  the  dividend  of  industrial  stocks. 


f 

J  —  00 


fi(x)  dx  =  P(x  <  x|x  e  Gi) 


(6.4.1) 


This  is  equal  to  the  conditional  probability  for  x  <  x  given  that  x  is  contained 
in  the  subpopulation  G, .  The  rule  of  total  probability  (2.3.4)  provides  the  rela¬ 
tionship  between  the  various  Fj  (x)  and  the  distribution  function  F(x)  for  G, 


F(x )  =  P(x  <  x\x  £  G)  =  ^P(x<  x |x  €  G,)P(x  £  Gi) 


i  =  1 


i.e., 

t 

F(x)  =  ^P(xeG,)Fi(x) 

i— 1 

Correspondingly  one  has  for  the  probability  density 

t 

f(x)  =  J2p^^Gi)fi(x) 


(6.4.2) 


(6.4.3) 
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If  we  now  abbreviate  P(x  e  G/)  by  pi ,  then  one  has 


/OO  [_  S*  00 

x/(x)dx  =  /  x/;(x)dx  , 

-oo  •  */  — 


/=1 


— oo 


X 


(6.4.4) 


1  =  1 


The  population  mean  is  thus  the  mean  of  the  individual  means  of  the 
subpopulations,  each  weighted  by  probability  of  its  subpopulation.  For  the 
population  variance  one  obtains 


o' 2  00 


/ 


OO 


(x  —  x)2/(x)dx 


— oo 
oo 


/oo  *_ 

(x  -  x)2  Pi  fi  (x)  dx 

■°°  «= l 

t 


1  POO 

!>/ 

«=i  •/-°0 


{ (x  X; )  T  (Xj  x) }  fi  (x)  dx 


All  cross  terms  vanish  since  the  x,  are  independent,  leading  to 


a 


w=r 


1  =  1 


/oo 


— oo 


/oo 

/i(x)dx 


— OO 


cr2(x)  =  y^p,{cr2  +  (x(- -x)2}  .  (6.4.5) 

/  =  1 

One  thus  obtains  the  weighted  mean  of  a  sum  of  two  terms.  The  first  gives  the 
dispersion  of  a  subpopulation,  the  second  gives  the  quadratic  deviation  of  the 
mean  of  this  subpopulation  from  the  mean  of  the  whole  population. 

Having  discussed  separating  a  population  into  parts,  we  now  take  from 
each  subpopulation  G,  a  sample  of  size  rq  (with  ^-=1  n-,  =  n)  and  examine 
the  arithmetic  mean  of  the  total  partitioned  sample 


1 

n 


t  Hj 


i= 1  j= 1 


1 

n 


t 

J2 ni*i 

i  =  \ 


(6.4.6) 
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with  the  expectation  value  and  variance 


E(xp) 

v2(xp) 


Using  (6.2.3)  this  is  finally 


1  n 

-)  HiXi  , 

n  z — ' 

1  =  1 

E{(xp-E(xp))2} 


£  I  (  Xi  ) 


i  —  \ 


]  1 

~2  ^n2E{(Xi-Xi)2}  , 


i= 1 
t 

y \n2i 'cr2 (Xi)  . 


-2  J2 

n~  ^ 
i— 1 


(6.4.7) 


(6.4.8) 


(6.4.9) 


One  would  obtain  the  same  result  by  application  of  the  law  of  error  propaga¬ 
tion  (3.8.7)  to  Eq.  (6.4.6). 

It  is  clear  that  the  arithmetic  mean  xp  cannot  in  general  be  an  estimator 
for  the  sample  mean  x,  since  it  depends  on  the  arbitrary  choice  of  the  size  n, 
of  the  samples  from  the  subpopulations.  A  comparison  of  (6.4.7)  with  (6.4.4) 
shows  that  this  is  only  true  for  the  special  case  p\  =rii/n. 

The  population  mean  x  can  be  estimated  in  the  following  way.  One 
first  determines  the  means  x,  for  the  subpopulations,  and  constructs  then  the 
expression 

t 

x  =  Y.P*i  ’  (6.4.10) 

i  —  \ 

in  analogy  to  Eq.  (6.4.4).  By  error  propagation  one  obtains  for  the  variance 
of  x 

a2(x)  =  ^p2<72(Xi)  =  .  (6.4.11) 


Example  6.6:  Optimal  choice  of  the  sample  size  for  subpopulations 

In  order  to  minimize  the  variance  cr2(x),  we  cannot  simply  differentiate  the 
relation  (6.4.11)  with  respect  to  all  nj,  since  the  «/  must  satisfy  a  constraint, 
namely 
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t 

J2ni-n  =  0  .  (6.4.12) 

i  =  1 

We  must  therefore  use  the  method  of  Lagrange  multipliers ,  by  multiplying 
Eq.  (6.4.12)  with  a  factor  p,  adding  this  to  Eq.  (6.4.1 1),  and  finally  setting  the 
partial  derivatives  of  the  with  respect  to  p  equal  to  zero: 


L  =  cr2(x)  +  p(^m  -n)  =  ^(pf  / m)af  +  pi^nt  -  n) 


dL 

dni 


p  —  0 


dL 
dp 

From  (6.4.13)  we  obtain 


(6.4.13) 

(6.4.14) 


rii  =  PiOi/Jp 

Together  with  (6.4.14)  this  gives 


i/v7^  =  n/Xl 


Pi  &i 


and  therefore 

Hi  —  npi  oi  /  J2  pi  oi  .  (6.4. 1 5) 

The  result  (6.4.15)  states  that  the  sizes  n{  of  the  samples  from  the  subpop¬ 
ulations  i  should  be  chosen  in  such  a  way  that  they  are  proportional  to  the 
probability  p,  of  subpopulation  i,  weighted  with  the  corresponding  standard 
deviation. 

As  an  example  assume  that  a  scientific  publishing  company  wants  to 
estimate  the  total  amount  spent  for  scientific  books  by  two  subpopulations: 
(1)  students  and  (2)  scientific  libraries.  Further,  we  will  assume  that  there  are 
1000  libraries  and  106  students  in  the  population  and  that  the  standard  devia¬ 
tion  of  the  money  spent  by  students  is  $  100,  and  for  libraries  (which  are  of 
greatly  differing  sizes)  $  3  •  105.  We  then  have 

p\  ~  1  ,  p2  ^  10-3  ,  <7 1  =  100  ,  02  =  3  x  105 


and  from  (6.4.15) 


n\  —  const  - 100 


«2  =  const  -  300 


«2  —  3«i 
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Note  that  the  result  does  not  depend  on  the  means  of  the  partial  populations. 
The  quantities  pi ,  jq ,  and  er(-  are  in  general  unknown.  They  must  first  be  esti¬ 
mated  from  preliminary  samples.  ■ 

The  discussion  of  subpopulations  will  be  taken  up  again  in  Chap.  1 1 . 


6.5  Samples  Without  Replacement  from  Finite  Discrete 
Populations.  Mean  Square  Deviation.  Degrees 
of  Freedom 


We  first  encountered  the  concept  a  sample  in  connection  with  the  hyperge¬ 
ometric  distribution  (Sect.  5.3).  There  we  determined  that  the  independence 
of  the  individual  sample  elements  was  lost  by  the  process  of  taking  elements 
without  replacing  them  from  a  finite  (and  hence  discrete)  population.  We  are 
therefore  no  longer  dealing  with  genuine  random  sampling,  even  if  no  partic¬ 
ular  choice  among  the  remaining  elements  is  made. 

To  discuss  this  further  let  us  introduce  the  following  notation.  Suppose 
the  population  consists  of  N  elements  y\ ,  yi,  ■  ■  . ,  yp.  From  it  we  take  a  sample 

of  size  n  with  the  elements  X] ,  X2 . x„.  (In  the  hypergeometric  distribution, 

the  yj  and  hence  the  x,  could  only  take  on  the  values  0  and  1.) 

Since  it  is  equally  probable  for  each  of  the  remaining  elements  yj  to  be 
chosen,  we  obtain  for  the  expectation  value 

1  N 

E(y)  =  y  =  y  =  —^yj  .  (6.5.1) 

/V  j= 1 

Although  y  is  not  a  random  variable,  this  expression  is  the  arithmetic  mean  of 
a  finite  number  of  elements  of  the  population.  A  definition  of  the  population 
variance  encounters  the  difficulties  discussed  at  the  end  of  Sect.  6.2.  We  define 
it  in  analogy  to  (6.2.6)  as 


1 


N-  1 


N 


£07  -  y)2 

7  =  1 


Let  us  now  consider  the  sum  of  squares. 


1 >;  -  502 


(6.5.2) 


(6.5.3) 
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Since  we  have  not  constrained  the  population  in  any  way,  the  yj  can  take  on  all 
possible  values.  Therefore  the  first  element  in  the  sum  in  (6.5.2)  can  also  take 
on  any  of  the  possible  values.  The  same  holds  for  the  2nd,  3rd, . . . ,  (N  —  l)th 
terms  summed.  The  ATh  term  in  the  sum  is  then,  however,  fixed,  since 

N 

v)  =  0  .  (6.5.4) 

7  =  1 

We  say  that  the  number  of  degrees  of  freedom  of  the  sum  of  squares  (6.5.3) 
is  N  —  1 .  One  can  illustrate  this  connection  geometrically.  We  consider  the 
case  y  =  0  and  construct  an  /V-di mensional  vector  space  with  the  yj.  The 
quadratic  sum  (6.5.3)  is  then  the  square  of  the  absolute  value  of  the  position 
vector  in  this  space.  Because  of  the  equation  of  constraint  (6.5.4)  the  tip  of  the 
position  vector  can  only  move  in  a  space  of  dimension  (N  —  1).  In  mechanics 
the  dimension  of  such  a  constrained  space  is  called  the  number  of  degrees  of 
freedom.  This  is  sketched  in  Fig.  6.7  for  the  case  N  —  2.  Here  the  position 
vector  is  constrained  to  lie  on  the  line  y2  —  —  y\. 


Yi 


Fig.  6.7  :  A  sample  of  size  two 
gives  a  sum  of  squares  with  one 
degree  of  freedom. 


A  sum  of  squares  divided  by  the  number  of  degrees  of  freedom,  i.e.,  an 
expression  of  the  form  (6.5.2)  is  called  a  mean  square  or  to  be  more  complete, 
since  we  are  dealing  with  the  differences  of  the  individual  values  from  the 
expectation  or  mean  value,  mean  square  deviation.  The  square  root  of  this 
expression,  which  is  then  a  measure  of  the  dispersion,  is  called  the  root  mean 
square  (RMS)  deviation. 

We  now  return  to  the  sample  Xi ,  x2,  ■  ■  ■ ,  x„.  For  simplicity  of  notation  we 
will  introduce  the  Kronecker  symbol,  which  describes  the  selection  procedure 
for  the  sample.  It  is  defined  as 


1 ,  if  x,  is  the  element  yj , 
0  otherwise. 


(6.5.5) 


6.5  Mean  Square  Deviation 
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In  particular  one  has 

N 

X,  =  V;  •  (6.5.6) 

7  =  1 

Since  the  selection  of  any  of  the  \’j  as  the  ith  element  is  equally  probable, 
one  has 

P(Sj  =  1)  =  l/N  .  (6.5.7) 

L 

Since  5/  describes  a  random  procedure,  it  is  clearly  a  random  variable  itself. 
Its  expectation  value  is  found  from  Eq.  (3.3.2)  (where  n  —  2,  x\  —  0,  %2  —  1) 
to  be 

E(8j)  =  P  (8;  =  1)  =  l/N  .  (6.5.8) 

If  now  one  element  x,  of  the  sample  is  determined,  one  then  has  only  (N  —  1) 
selection  possibilities  out  of  the  population  for  a  further  element,  e.g.,  X/K  . 
That  is, 

PVisi=l)=±j±-[  =  E(Sisl)  .  (6.5.9) 

Since  the  sample  is  taken  without  replacement,  one  has  j  /  i,  i.e., 

8{8Jk=  0  .  (6.5.10) 

Similarly  one  has 

Shf  =  0  ,  (6.5.11) 

since  two  different  elements  of  the  population  cannot  simultaneously  occur  as 
the  i  th  element  of  the  sample. 

We  consider  now  the  expectation  value  of  Xi, 


f  N  . 

£(xi)  =  £  j  XX T/  ’ 


N  ^  N 

=  YJyjE(.8[)  =  -YJyj 

7=1  7=1 


(6.5.12) 


Since  Xi  is  in  not  in  any  way  special,  the  expectation  values  of  all  elements  of 
the  sample,  and  thus  also  of  their  arithmetic  mean,  have  the  same  value 

E(x)  =  - 
n 

The  arithmetic  mean  of  the  sample  is  thus  an  unbiased  estimator  for  the  pop¬ 
ulation  mean. 

Next  we  consider  the  sample  variance 


J2E(Xi)  =  y  .  (6.5.13) 


J>;-X)2 


n  —  1 


(6.5.14) 
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By  means  of  a  somewhat  longer  calculation  it  can  be  shown  that  the  expecta¬ 
tion  value  is 

E(s2x)=a2(y )  .  (6.5.15) 

The  sample  variance  is  thus  an  unbiased  estimator  for  the  population  variance. 
The  variance  of  the  mean  is  also  of  interest: 


a2(x)  =  £{(x-£(x))2} 


E (x)  =  y  is,  however,  a  fixed,  not  a  random  quantity,  whereas  x  depends  on 
the  individual  sample,  and  is  hence  a  random  variable.  One  therefore  has 


a2(x)  =  £(x2)-y2  =  - 

n 

?  -  (r2(y)  /  n 

cr2(x)  =  — —  (1-- 

n  \  N 


(i-vXwM- 


92 


(6.5.16) 


Comparing  with  the  case  of  an  infinite  continuous  sample  (6.2.3)  one  sees  the 
additional  factor  (1  —  n /N).  This  corresponds  to  the  fact  that  the  variance  of  x 
vanishes  in  the  case  n  =  N,  where  the  “sample”  contains  the  entire  population 
and  where  one  has  exactly  x  =  y. 


6.6  Samples  from  Gaussian  Distributions:  x  2' -Distribution 


We  return  now  to  continuously  distributed  populations  and  consider  in  par¬ 
ticular  a  Gaussian  distribution  with  mean  a  and  variance  a2.  According  to 
(5.7.7),  the  characteristic  function  of  such  a  Gaussian  distribution  is 


<px(t)  =  exp(ita)  exp(— 


(6.6.1) 


We  now  take  a  sample  of  size  n  from  the  population.  The  characteristic  func¬ 
tion  of  the  sample  mean  was  given  in  (6.2.2)  in  terms  of  the  characteristic 
function  of  the  population.  From  this  we  have 


<Px(0  — 


(6.6.2) 


If  we  consider  (x  —  a)  =  (x  —  x)  in  place  of  x,  then  one  obtains 


<Px-a  (0  =  exp  (  - 


<tV 
2  n 


(6.6.3) 


6.6  Samples  from  Gaussian  Distributions:  /  ’-Distribution 
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This  is  again  the  characteristic  function  of  a  normal  distribution,  but  with  a 
different  variance, 

a2(x)  =  cr2(x)/n  .  (6.6.4) 

For  the  simple  case  of  a  standard  Gaussian  distribution  (a  =  0,  er2  =  1)  we 
have 

(Px(t)  =  exp(— t2 /2n)  .  (6.6.5) 

We  take  a  sample  from  this  distribution 


Xi,X2,...,X 


n 


5 


but  we  are  interested  in  particular  in  the  sum  of  the  squares  of  the  sample 
elements, 

x2  =  x2  +  X2  +  ...+x2  .  (6.6.6) 

We  want  to  show  that  the  quantity  x2  follows  the  distribution  function* 


F(X2)  = 


1 

r(},)2'- 


where 


1 

X  =  —n 
2 


The  quantity  n  is  called  the  number  of  degrees  of  freedom. 
We  first  introduce  the  abbreviation 


(6.6.7) 


(6.6.8) 


1 

nx)2'- 


(6.6.9) 


and  determine  the  probability  density  to  be 

fix2)  —  k(x2)x-1e-5*2  .  (6.6.10) 

For  two  degrees  of  freedom,  the  probability  density  is  clearly  an  exponential 
function.  First  we  want  to  prove  what  was  claimed  by  (6.6.7)  for  one  degree 
of  freedom  (X  =  1  /2).  Thus  we  ask  for  the  probability  that  x2  <  X2,  or  rather, 
that  —\px^  <  x  <  -\-\fx2.  This  is 


F(X2) 


Pix2  <  X2)  =  Pi-Jx 2  <  X  <  +\/x2) 


—  f 

\r2j r  J- 


Vf- 


-lx2  J 

e  2  dx  — 


-Vx2 


—  ( 

V27T  Jo 


-lx2  J 

e  2  dx 


*The  symbol  /  2  (chi  squared)  was  introduced  by  K.  Pearson.  Although  it  is  written  as 
something  squared,  which  reminds  one  of  its  origin  as  a  sum  of  squares,  it  is  treated  as  a  usual 
random  variable. 
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Setting  x2  —  u,  du  —  2x  dx ,  we  obtain  directly 

2 

1  C  ^  i  i 

F(x2)  —  —j=  I  u-X2t-hudu  .  (6.6.11) 

\Jln  Jo 

To  prove  the  general  case  we  first  find  the  characteristic  function  of  the 
X  2-distribution  to  be 

poo  | 

<p2(t)=  k(x2)X~lexp(--x2  +  itx2)dx2  (6.6.12) 

Jo  2 

or  with  (1  /2  —  i t)x2  —  v, 

POO 

(px2(t)=2x(l-2ityxk  i/-1e_vdv  . 

Jo 

The  integral  on  the  right  side  is,  according  to  (D.1.1),  equal  to  r(X).  One 
therefore  has 

(px2{t)  =  {\-2it)~x  .  (6.6.13) 

If  we  now  consider  the  case  of  a  second  distribution  with  X',  then  one  has 

<pfx2(t)  =  (1  -2it)~x'  . 

Since  the  characteristic  function  of  a  sum  is  equal  to  the  product  of  the  char¬ 
acteristic  functions,  one  has  the  following  important  theorem: 

The  sum  of  two  independent  x 2  variables  with 

n\,  ri2  degrees  of  freedom  follows  itself  a  x  ^distribution 

with  n  =  m  +ri2  degrees  of  freedom. 

This  theorem  can  now  be  used  to  easily  generalize  the  claim  (6.6.7),  proven  up 
to  now  only  for  n  =  1 .  The  proof  follows  from  the  fact  that  the  individual  terms 
of  the  sum  of  squares  are  independent  and  therefore  (6.6.6)  can  be  treated  as 
the  sum  of  n  different  x2  variables,  each  with  one  degree  of  freedom. 

In  order  to  obtain  the  expectation  value  and  variance  of  the  x  -distribution, 
we  use  the  characteristic  function,  whose  derivatives  (5.5.7)  give  the  central 
moments.  We  obtain 

£(x2)  =  -i(p\0)  =  2X  ,  (6.6.14) 

£(x2)  =  n 

and 

£{(x2)2}  =  -<p"{  0)  =  4X2+4X  , 

a2(x2)  =  £{(x2)2}  -  {£Xx2)}2  =  4X 
cr2(x2)  —  2  n 


(6.6.15) 


6.6  Samples  from  Gaussian  Distributions:  /  ’-Distribution 
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The  expectation  value  of  the  y  ^distribution  is  thus  equal  to  the  number  of 
degrees  of  freedom,  and  the  variance  is  two  times  larger.  Figure  6.8  shows  the 
probability  density  of  the  y 2  distribution  for  various  values  of  n.  One  sees  [as 
can  be  directly  seen  also  from  (6.6.10)]  that  for  y2  =  0,  the  function  diverges 
for  n  =  1,  is  equal  to  1  /2  for  n  =  2,  and  vanishes  for  n  >  3.  A  table  of  the 
y  2-distribution  is  provided  in  the  appendix  (Table  1.6). 

The  y  2-distribution  is  of  great  significance  in  many  applications,  where 
the  quantity  y2  is  used  as  a  measure  of  confidence  in  a  certain  result.  The 
smaller  the  value  of  y2,  the  greater  is  the  confidence  in  the  result.  (After  all, 
y2  was  defined  as  the  sum  of  squares  of  deviations  of  elements  of  a  sample 
from  the  population  mean.  See  Sect.  8.7.)  The  distribution  function 

F(X2)  =  P(*2  <  y2)  (6.6.16) 

gives  the  probability  that  the  random  variable  x2  is  not  larger  than  y2.  In 
practice,  one  frequently  uses  the  quantity 

W(y2)  =  l  — F(y2)  (6.6.17) 

as  a  measure  of  confidence  in  a  result.  VF(y2)  is  often  called  the  confidence 
level.  W (y  2)  is  large  for  small  values  of  y  2  and  falls  with  increasing  y  2.  The 


Fig.  6.8:  Probability  density  of/2  for  the  number  of  degrees  of  freedoms  =  1, 2, . . . ,  10.  The 
expectation  value  E(x2)  =  n  moves  to  the  right  as  n  increases. 
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distribution  function  (6.6.16)  is  shown  in  Fig.  6.9  for  various  numbers  of  deg¬ 
rees  of  freedom  n.  The  inverse  function,  which  gives  the  quantiles  of  the  x2 
distribution, 

Xf  =  X2(F)  =  X2  (6.6.18) 

is  used  especially  often  in  “hypothesis  testing”  (see  Sect.  8.7).  It  is  tabulated 
in  the  appendix  (Table  1.7). 

Up  to  now  we  have  restricted  ourselves  to  the  case  where  the  popula¬ 
tion  is  described  by  the  standard  normal  distribution.  Usually,  however,  one 
has  a  normal  distribution  in  general  form  with  mean  a  and  variance  a2.  Then 
the  sum  of  squares  (6.6.6)  is  clearly  no  longer  distributed  according  to  the 
X 2-distribution.  One  immediately  obtains,  however,  a  x  2-distribution  by  con¬ 
sidering  the  quantity 

2  (Xi  —  a)2  +  (X2  —  a)2  -\ - F  (x„  —  a)2  .  ,m 

x  — - ~z -  .  (6.6.19) 

cr2 

This  result  follows  directly  from  Eq.  (5.8.4). 

If  the  expectation  values  a,  and  variances  o\  of  the  individual  variables 
are  different,  then  one  has 


n 


A 


o 


Fig.  6.9  :  Distribution  function  for  x2  for  the  number  of  degrees  of  freedom  n—  1 , 2, . . . ,  20. 
The  function  for  n  —  1  is  the  curve  at  the  far  left,  and  the  function  for  n  =  20  is  at  the  far 
right. 
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2  (Xi-ax)1  (X2-02)  ,  , 

xz  = - r - H - x - 1 - 1 - r — 


a 


l 


Or 


a 


(6.6.20) 


n 


Finally  if  the  n  variables  are  not  independent,  but  are  described  by  a  joint 
normal  distribution  (5. 10. 1)  with  the  expectation  values  given  by  the  vector  a 
and  the  covariance  matrix  C  —  B  ~ 1 ,  then  one  has 


x2  =  (x-a)T5(x-a)  . 


(6.6.21) 


6.7  x 2  and  Empirical  Variance 


In  Eq.  (6.2.6)  we  found  that 


s2=^-i>-x)2 

n  —  1  L — 4 

i— 1 


(6.7.1) 


is  a  consistent,  unbiased  estimator  for  the  variance  a2  of  a  population.  Let 
the  x,  be  independent  and  normally  distributed  with  standard  deviation  a .  We 
want  to  show  that  the  quantity 


n  —  1 


cr 


(6.7.2) 


follows  the  ^-distribution  with  f  —  n  —  1  degrees  of  freedom.  We  first  carry 
out  an  orthogonal  transformation  of  the  n  variables  x(-  (see  Sect.  3.8): 


Yi  = 


Y2  = 


Ys  = 


1 


1 


y«—i  = 


y«  = 


V2^3 

1 

V34 


1 


(xi  -  x2) 


(xi+x2-2x3) 


(Xi  +  x2  +  x3  -  3x4) 


(6.7.3) 


(Xi  +x2H - l-xn_i  —  {n—  l)xw)  , 


V(n  —  l)n 

1 

(Xi+x2H - hx„)  =  V«x  . 


One  can  verify  that  this  transformation  is  in  fact  orthogonal,  i.e.,  that 


n 


n 


E*?  =  E y?  • 


(6.7.4) 
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Since  a  sum  or  difference  of  independent  normally  distributed  quantities  is 
again  itself  normally  distributed,  all  of  the  y,  are  normally  distributed.  The 
factors  in  (6.7.3)  ensure  that  the  y,  have  mean  values  of  zero  and  standard 
deviations  a . 

From  (6.7.1)  and  (6.7.2)  one  then  has 

(„-ds2  =  y^(Xi-x)2=y^x?-2  xy^xj+nx2 

=  X>2  -  „x2  =  £y2  -  y2  =  |>?  . 

i= 1  i— 1  i = 1 

This  expression  is  a  sum  of  only  (n  —  1)  independent  squared  terms.  A 
comparison  with  (6.6.19)  shows  that  the  quantity  (6.7.2)  in  fact  follows  a 
X  2-distribution  with  (n  —  1 )  degrees  of  freedom. 

The  squared  terms  (x,  —  x)2  are  not  linearly  independent.  One  has  the 
following  relation  between  them: 

n 

J](X/-X)  =0  . 

i— 1 

One  can  show  that  every  additional  relation  between  the  squared  terms  red¬ 
uces  the  number  of  degrees  of  freedom  by  one.  Later  we  will  make  frequent 
use  of  this  result,  which  we  only  state  here  without  proof. 


6.8  Sampling  by  Counting:  Small  Samples 


Samples  are  often  obtained  in  the  following  way.  One  draws  n  elements  from 
a  population,  checks  if  they  possess  a  given  characteristic  and  accepts  only 
those  k  elements  into  the  sample  that  have  the  characteristic.  The  remaining 
n  —  k  elements  are  rejected,  i.e.,  their  properties  are  not  recorded.  This  app¬ 
roach  thus  becomes  the  counting  of  k  out  of  n  elements  drawn. 

This  approach  corresponds  exactly  to  selecting  a  sample  according  to  a 
binomial  distribution.  The  parameters  p  and  q  of  this  distribution  correspond 
then  to  the  occurrence  or  non-occurrence  of  the  property  in  question.  As  will 
be  shown  in  Example  7.5, 

_  k 

S  (/?)  =  -  (6.8.1) 

n 

is  the  maximum  likelihood  estimator  of  the  parameter  p.  The  variance  of  S  is 


a2(S(p))  = 


p(\-p) 


n 


(6.8.2) 
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By  using  (6.8.1)  it  can  be  estimated  from  the  sample  by 


We  define  the  error  A  k  as 


Ak  =  -J  [s2(S(n/?))] 


By  using  (6.8.3)  we  obtain 


(6.8.3) 


(6.8.4) 


(6.8.5) 


The  error  AY,  only  depends  on  the  number  of  elements  counted  and  on  the 
size  of  the  sample.  It  is  called  the  statistical  error.  A  particularly  important 
case  is  that  of  small  k,  or  more  precisely,  the  case  k  n.  In  this  limit  we 
can  define  X  =  np  and  following  Sect.  5.4  consider  the  counted  number  k  as  a 
single  element  of  a  sample  taken  from  a  Poisson  distributed  population  with 
parameter  A.  From  (6.8.1)  and  (6.8.5)  we  obtain 


S(A)  —  S  (np)  —  k 


(6.8.6) 


AX  =  Vk  .  (6.8.7) 

(This  can  be  derived  by  using  the  result  of  Example  7.4  with  N  =  l.)  The 
result  (6.8.7)  is  often  written  in  an  actually  incorrect  but  easy  to  remember 
form, 

ay  =  Vy  , 

which  is  read:  The  statistical  error  of  the  counted  number  k  is  Vk. 

In  order  to  interpret  the  statistical  error  AX  =  vk  we  must  examine  the 
Poisson  distribution  somewhat  more  closely.  Let  us  begin  with  the  case  where 
k  is  not  too  small  (say,  k  >  20).  For  large  values  of  X  the  Poisson  distribution 
becomes  a  Gaussian  distribution  with  mean  X  and  variance  cr2  =  X.  This  can 
be  seen  qualitatively  from  Fig.  5.6.  As  long  as  k  is  not  too  small,  i.e.,  k  1, 
we  can  then  treat  the  Poisson  distribution  in  k  with  parameter  A  as  a  normal 
distribution  in  x  with  mean  A  and  variance  a2  —  A.  The  discrete  variable  k  is 
then  replaced  by  the  continuous  variable  x.  The  probability  density  of  x  is 


fix'.  A) 


1 


a 


>/2 


exp 


71 


( x  —  A)2 

1 

(x  —  A)2 

2(7  2 

V2ttA  P 1 

2A 

.  (6.8.8) 


The  observation  of  k  events  corresponds  to  observing  once  the  value  of  the 
random  variable  x  =  k. 
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With  the  help  of  the  probability  density  (6.8.8)  we  now  want  to  determine 
the  confidence  limits  at  a  given  confidence  level  —  1  —  a  in  such  a  way  that 

P(k-  <  A  <  A+)  =  1  —  a  .  (6.8.9) 

That  is,  one  requires  that  the  probability  that  the  true  value  of  A  is  contained 
within  the  confidence  limits  A_  and  A+  be  equal  to  the  confidence  level  1  —  a. 
The  limiting  cases  A  =  A_  and  A  =  A+  are  depicted  in  Fig.  6.10.  They  are 
determined  such  that 


Pix  >  k|A  =  A_|_)  =  1  — a/2  ,  Pix  <  k|A  =  A_)  =  1  —  a/2  .  (6.8.10) 


One  clearly  has 

•x=k 


a/2  = 


I, 

I. 


X  —  —  oo 
u=(k—k+)/cr 


i  r 

f(x;  A+)d*  =  — —  I 

V  2;r  J- 


G 


px=k 

(x  -  A+)2 ' 

/  exp  < 

l—o o 

2  G2 

0o(w)d  u  = 


k  — A 


+ 


u=—oo 


G 


and  correspondingly 


fx=k  /k-A. 

1  —  a/2  =  /  /(x;A_)dx  =  fc 


— OO 


G 


(6.8.11) 


(6.8.12) 


Here  (po  and  fio  are  the  probability  density  and  distribution  function  of  the 
standard  normal  distribution  introduced  in  Sect.  5.8.  By  using  the  inverse 
function  Q  of  the  distribution  function  i/'o  [see  Eq.  (5.8.8)],  one  obtains 

k-A_  k-A+ 

- =  ,0(1  —  a/2)  ,  - —  =  0(a/2)  .  (6.8.13) 

O'  O' 

Because  of  (5.8.10)  one  has  0(1  —  a/2)  =  0'(1  —  a)  and  because  of  the 
symmetry  of  the  function  O,  0(1  —  a/2)  =  —  0(a/2).  Further,  since  a  <  1, 
one  has  0(1  —  a/2)  >  0,  0(a/2)  <0.  From  this  we  finally  obtain 


A_  =  k  — crO;(l  — a)  ,  A+ =  k  +  crO^l  —  a)  .  (6.8.14) 

According  to  (6.8.6),  k  is  the  best  estimator  for  A.  Since  a2  —  A,  the 
best  estimator  for  a  is  given  by  S  =  vk.  Since  we  have  assumed  that  k  1, 
the  uncertainty  in  s  is  significantly  smaller  than  the  uncertainty  in  k.  We  can 
therefore  substitute  x  —  k  and  s  =  vk  in  (6.8.9)  and  obtain  for  the  confidence 
interval  with  confidence  level  1  —  a 


A_  =  k-  VkOr(l  -  a)  <  A  <  k  + VkOr(l  —  a)  =  A+ 


(6.8.15) 
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x  =  k 


i>  X 


x  =  k 


O  X 


Fig. 6.10:  Normal  distribution  with  mean  A  and  standard  deviation  a  for  A  =  A _  and  A  =  A+. 


For  1  —  a  =  68.3  %  we  find  from  Sect.  5.8  or  Table  1.5  that  Q' (a)  =  1.  What 
is  usually  reported, 

A,  —  k  i  Vk  , 

which  was  already  the  result  of  (6.8.6)  and  (6.8.7),  thus  gives  the  confidence 
limits  at  the  confidence  level  of  68.3  %,  but  only  for  the  case  k  1.  For  the 
confidence  level  of  90%,  i.e.,  for  a  =  0.1,  we  find  i2'(0.1)  =  1.65  and  for 
the  confidence  level  99%  one  has  £2'(0.01)  =  2.57. 

For  very  small  values  of  k,  one  can  no  longer  replace  the  Poisson  distribu¬ 
tion  with  the  normal  distribution.  We  follow  therefore  reference  [25],  We  start 
again  from  Eq.  (6.8.10),  but  use,  instead  of  the  probability  density  (6.8.8)  for 


140 


6  Samples 


the  continuous  random  variable  x  with  fixed  parameter  X,  the  Poisson  proba¬ 
bility  for  observing  the  discrete  random  variable  n  for  a  given  X, 


f(n;X)  = 


(6.8.16) 


For  the  observation  k  we  now  determine  the  confidence  limits  X-  and  X+, 
which  fulfill  (6.8.10)  with  x  —  n  (Fig.  6.1 1)  and  obtain  in  analogy  to  (6.8.1 1) 
and  (6.8.12) 


oo  k 

1  —  a/2  =  ^  f(n;X+)=l-^2f(n;X+)=l-F(k+l;X+) 

n= k+1  n= 0 


k  =  25,  p  =  .683  ,  X_  =  20.031 


o  n 


0  5  10  15  20  25  30  35  40  45  50 

- >  n 


Fig. 6.11:  Poisson  distribution  with  parameter  A  for  A  =  A_  and  A  =  A+. 
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k-1 

1  —  a/2  =  y ^  f(n;X-)  =  F( k;  A._) 

n=0 


or 


a/2  =  F(k+1;A.+) 
1  —  a/2  =  F(k;A_)  . 


(6.8.17) 

(6.8.18) 


F(k\  X)  —  ^  f(n\ X)  =  P(k<k ) 

w=0 

is  the  distribution  function  of  the  Poisson  distribution. 

In  order  to  obtain  numerical  values  for  the  confidence  limits  A+  and  a_, 
we  solve  Eqs.  (6.8.17)  and  (6.8.18).  That  is,  we  must  construct  the  inverse 
function  of  the  Poisson  distribution  for  fixed  k  and  given  probability  P  (in 
our  case  a/2  and  1  — a/2 ), 


X  —  Xp  (k) 


(6.8.19) 


This  is  done  numerically  with  the  method  StatFunct.quantile 
Poisson.  The  function  (6.8.19)  is  given  in  Table  1.1  for  frequently  occurring 
values  of  P. 

For  extremely  small  samples  one  is  often  only  interested  in  an  upper  con¬ 
fidence  limit  at  confidence  level  fi>  =  1  —  a.  This  is  obtained  by  requiring 

P(n>  k\X  =  X(up))  =  =  1  -  a  (6.8.20) 


instead  of  (6.8.10).  Thus  one  has 

k 

a  =  y^/(n;A(up))  =  F(  k+1;A(up))  .  (6.8.21) 

n=  0 


For  the  extreme  case  k  =  0,  i.e.,  for  a  sample  in  which  no  event  was 
observed,  one  obtains  the  upper  limit  A/up)  by  inverting  a  =  F(l;A/up)). 
The  upper  limit  then  has  the  following  meaning.  If  the  true  value  of  the 
parameter  were  in  fact  X  —  A(up)  and  if  one  were  to  repeat  the  experiment 
many  times,  then  the  probability  of  observing  at  least  one  event  is  fi.  The  obs¬ 
ervation  k  =  0is  then  expressed  in  the  following  way:  One  has  X  <  A(up)  with 
a  confidence  level  of  1  —  a.  From  Table  1.1  one  finds  that  k  =  0  corresponds 
to  X  <  2.996  ^  3  at  a  confidence  level  of  95  %. 
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Example  6.7 :  Determination  of  a  lower  limit  for  the  lifetime  of  the  proton 
from  the  observation  of  no  decays 

As  already  mentioned,  the  probability  for  the  decay  of  a  radioactive  nucleus 
with  the  time  t  is 

Pit)  =  —  f  q~xIt  d.r 
r  Jo 

Here  r  is  the  mean  lifetime  of  the  nucleus.  For  t  4 C  r  the  expression  simpli¬ 
fies  to 

P(t)  —  t/z 

For  a  total  of  N  nuclei  one  expects  that 

k  =  NP(t)  =  N-t/z 

nuclei  will  decay  within  the  time  t.  The  mean  lifetime  r  is  obtained  by  count¬ 
ing  the  number  of  such  decays.  If  one  observes  k  decays  from  a  total  of  N 
nuclei  in  a  time  t ,  then  one  obtains  as  the  measured  value  of  r 


Of  particular  interest  is  the  mean  lifetime  of  the  proton,  one  of  the  primary 
building  blocks  of  matter.  In  experiments  recently  carried  out  with  great  effort, 
one  observes  large  numbers  of  protons  with  detectors  capable  of  detecting 
each  individual  decay.  Up  to  now,  not  a  single  decay  has  been  seen.  According 
to  Table  1.9,  the  true  expected  number  of  decays  A  does  not  exceed  three  (at  a 
confidence  level  of  95  %).  One  has  therefore 

N 

T  >  — t 

3 

at  this  confidence  level.  Typical  experimental  values  are  t  =  0.3  years,  N  = 
1033,  i.e., 

r  >  1032  years  . 

The  proton  can  therefore  be  considered  as  stable  even  over  cosmological  time 
scales,  if  one  considers  that  the  age  of  the  universe  is  estimated  to  be  only 
around  1010  years.  ■ 

6.9  Small  Samples  with  Background 

In  many  experiments  one  is  faced  with  the  following  situation.  For  the 
detected  events  one  cannot  determine  whether  they  belong  to  the  type  that 
is  actually  of  interest  ( signal  events )  or  to  another  type  ( background  events). 


6.9  Small  Samples  with  Background 


143 


For  the  expected  number  of  events  in  the  experiment  one  then  has  a  Pois¬ 
son  distribution  with  the  parameter  X  =  Xs  +  XB.  Here  X s  is  the  sought  after 
parameter  of  the  number  of  signal  events,  and  XB  is  the  parameter  for  the 
background  events,  which  must  of  course  be  known  if  one  wants  to  obtain 
information  about  XB.  (In  an  experiment  as  in  Example  6.7,  one  might  have, 
for  example,  an  admixture  of  radioactive  nuclei  whose  decays  cannot  be  dis¬ 
tinguished  from  those  of  the  proton.  If  the  number  of  such  nuclei  and  their 
lifetime  is  known,  then  X B  can  be  computed.) 

We  are  now  tempted  to  simply  take  the  results  of  the  last  section,  to 
determine  the  confidence  limits  X±  and  the  upper  limit  A,up)  and  to  set 

Xs±  =  X±  —  XB,  =  A(lip)  —  XB.  This  procedure  can,  however,  lead  to  non¬ 
sensical  results.  (As  seen  in  Example  6.7,  one  has  Alup)  =  3  at  a  confidence 

level  of  95%,  for  k  —  0.  For  XB  —4,  k  —  0  we  would  obtain  A^up')  =  —  1, 
although  a  value  Xs  <  0  has  no  meaning.) 

The  considerations  up  to  now  are  based  on  the  following.  The  probability 
for  observing  n  events,  n  =  ns  +  nB,  is 

f(n-Xs  +  k.B)  =  -^s+^\Xs  +  XBr  ,  (6.9.1) 

n\ 

and  the  probabilities  to  observe  ns  signal  events,  and  nB  background 
events  are 

fins\Xs)  =  -^—c~XsXnss  ,  (6.9.2) 

ns\ 

f(nB-XB)  =  —z~kBXnBB  .  (6.9.3) 

nB\ 

The  validity  of  (6.9.1)  was  shown  in  Example  5.5  with  the  help  of  the  charac¬ 
teristic  function,  starting  from  the  independence  of  the  two  Poisson  distribu¬ 
tions  (6.9.2)  and  (6.9.3).  One  can  also  obtain  them  directly  by  summation  of 
all  products  of  the  probabilities  (6.9.2)  and  (6.9.3)  that  lead  to  n  =  ns  +nB, 
by  application  of  (B.4)  and  (B.6), 


n 

^2  f(ns;  xs)f(n  -  ns;  XB ) 

ns=0 

n  1 

_  1  ^ns^n-ns 

—  C  /  - An  Ad 

' 1  nsl(n-ns)l 
nS= 0 


_Le-(^s+^s) 

n\ 


A  ^  A 


n—ns 

B 


=  le-(^+Xs)(As  +  AB)n 

n\ 

—  f(n>^s  +  hB) 
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The  difficulties  explained  above  are  overcome  with  a  method  developed 
by  Zech  [26],  In  an  experiment  in  which  k  events  are  recorded,  the  number 
of  background  events  cannot  simply  be  given  by  (6.9.3),  since  from  the  result 
of  the  experiment  it  is  known  that  nB  <  k.  One  must  therefore  replace  (6.9.3) 
by 


k)  =  f(nB;  kB)  j  f(nB;kB )  ,  nB  <  k  .  (6.9.4) 

/  ub=0 

This  distribution  is  normalized  to  unity  in  the  region  0  <  nB  <  k.  In  a  corre¬ 
sponding  way  the  distribution 

f(n;ks  +  kB)  =  f(n\  ks  +  kB)  J  ^  f(nB;kB )  (6.9.5) 

/  ub=0 


takes  the  place  of  (6.9.1). 

In  this  way  one  obtains  in  analogy  to  (6.8.17)  and  (6.8.18)  for  the  limits 
of  the  confidence  interval  A <  ks  <  ^s+  at  a  confidence  level  of  1  —  a 

a/2  —  F'(k+  l,Xs+  +  XB)  ,  (6.9.6) 

1  —a/2  =  F'(k,A.5_  +  XB)  .  (6.9.7) 


F;(k;  As  +  As)  =  J]  /'fa  +  kB)  =  P(k  <  k)  (6.9.8) 

n— 0 

is  the  distribution  function  of  the  renormalized  distribution  (6.9.4).  If  only  an 
upper  limit  at  confidence  level  1  —  a  is  desired,  then  one  clearly  has  in  analogy 
to  (6.8.21) 

a  =  F'(k+l,4up)  +  *fl)  •  (6.9.9) 

Table  6.3  gives  some  numerical  values  computed  with  the  methods  of 
the  class  Smal  1  Sampl  e.  Note  that  for  k  =  0,  Eq.  (6.9.7)  has  no  meaning, 
so  that  ks~  cannot  be  defined.  In  this  case  Eq.  (6.9.6)  and  also  ks+  are  not 
meaningful.  (In  the  table,  however,  the  values  for  ks-  and  ks+  are  shown  as 
computed  by  the  program.  This  sets  ks-  =  0  and  computes  7^+  according  to 
(6.9.6).) 

6.10  Determining  a  Ratio  of  Small  Numbers  of  Events 

Often  a  number  of  signal  events  k  is  measured  and  compared  to  a  number 
of  reference  events  d.  One  is  interested  in  the  true  value  r  of  the  ratio  of  the 
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Table6.3:  Limits  and  A,$+  of  the  confidence  interval  and  upper  confidence  limits  ai<!'p\ 
for  various  values  of  Xb  and  various  very  small  sample  sizes  k  for  a  fixed  confidence  level  of 
90%. 


/3  = 

0.90 

k 

Xb 

^S- 

Xs+ 

i  (up) 
AS 

0 

0.0 

0.000 

2.996 

2.303 

0 

1.0 

0.000 

2.996 

2.303 

0 

2.0 

0.000 

2.996 

2.303 

1 

0.0 

0.051 

4.744 

3.890 

1 

1.0 

0.051 

4.113 

3.272 

1 

2.0 

0.051 

3.816 

2.995 

2 

0.0 

0.355 

6.296 

5.322 

2 

1.0 

0.100 

5.410 

4.443 

2 

2.0 

0.076 

4.824 

3.877 

3 

0.0 

0.818 

7.754 

6.681 

3 

1.0 

0.226 

6.782 

5.711 

3 

2.0 

0.125 

5.983 

4.926 

4 

0.0 

1.366 

9.154 

7.994 

4 

1.0 

0.519 

8.159 

7.000 

4 

2.0 

0.226 

7.241 

6.087 

5 

0.0 

1.970 

10.513 

9.275 

5 

1.0 

1.009 

9.514 

8.276 

5 

2.0 

0.433 

8.542 

7.306 

number  of  signal  events  to  the  number  of  reference  events,  or  more  precisely, 
the  ratio  of  the  probability  to  observe  a  signal  event  to  that  of  a  reference 
event.  As  an  estimator  for  this  ratio  one  clearly  uses 

r  =  k/d  . 


We  now  ask  for  the  confidence  limits  of  r.  If  k  and  d  are  sufficiently  large, 
then  they  may  be  approximated  as  Gaussian  variables  with  standard  devia¬ 
tions  Ok  =  Vk,  crd  =  Vd.  Then  according  to  the  law  of  error  propagation 


one  has 


(6.10.1) 


If  in  addition  one  has  d  k,  then  Ar  is  simply 


r 

Vk 


Ar  — 


(6.10.2) 
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If  the  requirements  for  the  validity  of  (6.10.1)  or  even  of  (6.10.2)  are  not 

fulfilled,  i.e.,  if  k  and  d  are  small  numbers,  then  one  must  use  considerations 

developed  by  James  and  Roos  [23].  Clearly  when  one  observes  an  individual 

event,  the  probability  that  it  is  a  signal  event  is  given  by 

r 


1+r 


(6.10.3) 


and  the  probability  that  it  is  a  reference  event  is 


(6.10.4) 


In  an  experiment  in  which  a  total  of  N  =  k  +  d  are  observed,  the  probability 
that  exactly  n  signal  events  are  present  is  given  by  a  binomial  distribution 
(5.1.3).  This  is 


f(n\r) 


= 


The  probability  to  have  n  <  k  is  then 


k-l 

P(n  <  k)  =  fin;  r)  =  F(  k;  r ) 

77—0 


with 


(6.10.5) 


(6.10.6) 


(6.10.7) 


i.e.,  the  distribution  function  of  the  binomial  distribution.  To  determine  the 
limits  r_  and  r+  of  the  confidence  interval  at  the  confidence  level  f  =  1  —  a , 
we  use  in  analogy  to  (6.8.17)  and  (6.8.18) 


a/2  =  F(k+l;r+) 
1— a/2  =  F(k;  r_) 


(6.10.8) 

(6.10.9) 


If  one  only  seeks  an  upper  limit  at  the  confidence  level  f>  =  1  —  a,  it  can  be 
obtained  from  [see  Eq.  (6.8.21)] 

a  =  F(k+  1;  r(up))  .  (6.10.10) 

The  quantities  r+,  r_,  and  r(up)  can  be  computed  for  given  values  of  k,  d, 
and  f  with  the  class  SmallSample. 


6. 1 1  Ratio  of  Small  Numbers  of  Events  with  Background 
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6.11  Ratio  of  Small  Numbers  of  Events  with  Background 


By  combining  as  done  by  Swartz  [24]  the  ideas  of  Sects.  6.9  and  6.10,  one 
can  deal  with  the  following  situation.  In  an  experiment  one  has  three  types 
of  events:  signal,  background,  and  reference.  Signal  and  background  events 
cannot  be  distinguished  from  each  other.  Suppose  in  the  experiment  one  has 
detected  a  total  of  k  signal  and  background  events  and  d  reference  events.  Let 
us  label  with  r$  and  rB  the  true  values  (in  the  sense  of  the  definition  at  the 
beginning  of  the  previous  section)  of  the  ratios  of  the  numbers  of  signal  to 
reference  and  background  to  reference  events.  Then  the  probabilities  that  a 
randomly  selected  event  is  signal  or  background,  ps  and  p B  ,  are 


rs  rB 

PS  =  t— — — —  ,  Pb  =  — — — — 

1  +rs  +  rB  1  +rs  +  rB 

The  probability  that  it  is  a  reference  event  is  then 

1 

PR  =  1  Ps  Pb  =  — - ; - 

1  +rs  +  rB 


(6.11.1) 


(6.11.2) 


If  one  has  a  total  of  N  =  k+ d  events  in  the  experiment,  then  the  individual 
probabilities  that  one  has  exactly  ns  signal  events,  n B  background  events,  and 
nB  =  N  —  n  s  —  n  B  reference  events  are 


f s(ns\ ps)  =  ~Ps^~US  ’  (6.11.3) 

fB(nB;pB)  =  Pb)N~Hb  ,  (6.11.4) 

fR{nR',PR )  =  (  ~Pr)N~Hr  •  (6.11.5) 

Since  there  are  now  three  mutually  exclusive  types  of  events,  one  has 

instead  of  a  binomial  distribution  (6.10.5)  a  trinomial  distribution,  i.e.,  a 

multinomial  distribution  (5.1.10)  with  i  =  3.  The  probability  that  in  an  exp¬ 
eriment  with  a  total  of  N  events  one  has  exactly  ns  signal,  nB  background, 
and  N  —  ns  —  nB  reference  events  is  therefore 


f(ns,nB\rs,rB) 


N! 


PnsS PbB  (!  -  ps  -  pb) 


n  sin  Bl(bl  —  n  s  —  n  B)l 


N  —ns—nB 


(6.11.6) 


Here,  however,  we  have  not  taken  into  consideration  that  the  number  of  back¬ 
ground  events  cannot  be  greater  than  k.  In  a  manner  similar  to  (6.9.4)  one 


uses 
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/  k 

f'B(nB\  Pb)  =  f Bins',  Pb)  I  fins',  pB)  ,  nB  <  k  ,  (6.11.7) 

/  ub=0 

in  place  of  /#.  This  replacement  must  also  be  made  in  (6.1 1.6),  which  gives 

/  k 

f'(ns,nB\rs,rB)  =  f(ns,nB-,rs,rB)  /  2_,  f(nB\PB)  •  (6.11.8) 

/  ub=  0 

The  probability  to  have  ns  signal  events  regardless  of  the  number  of  back¬ 
ground  events  is 


f(ns;rs,rB)  =  f(ns,nB;rs,rB) 

nB=  0 


and  finally  the  probability  to  have  ns  <  k  is 

k— hb~  1 

F\W,rs,rB)  =  ^2  f'(ns;rs,rB) 

ns=0 

N! 


k— riB  —  1  k—  1 

E  E - 

„5=o  „B=o«5!«s!(N-n5-n5)! 


k  N' 

E - - - 

nB=onBKN -nBy. 


PbB(* 1-Pb) 


N  —TIB 


Since  rB  was  assumed  to  be  known,  the  quantity  F'  for  a  given  k  depends 
only  on  rs ■  Similar  to  (6.9.6)  and  (6.9.7)  one  can  determine  the  limits  rv+  and 
rs-  of  the  confidence  region  for  rs  with  confidence  level  —  1  —  a  from  the 
following  requirement: 


a/2  =  F'(k+l-rs+,rB)  ,  (6.11.9) 

1  —a/2  =  F'(k;rS-,rB )  .  (6.11.10) 

If  one  only  wants  an  upper  limit  with  confidence  level  fJ>  =  1  —  a,  this  can  be 
found  according  to  (6.9.9)  from 

a  =  F,(k+\\rfv),rB)  .  (6.11.11) 

Table  6.4  contains  some  numerical  values  computed  with  methods  of 
the  class  SmallSample.  For  k  =  0,  however,  (6.11.10)  and  hence  also 
rs-  have  no  meaning.  Similarly  for  (6.1 1.9),  rs+  for  k  =  0  is  not  meaningful. 

(In  the  table,  however,  values  for  rs-  and  rs+  are  given  for  k  =  0  as  computed 

by  the  program.  For  k  =  0  this  sets  rs-  =  0  and  determines  rs+  according  to 
(6.11.9).) 
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Table  6.4:  Limits  rs-  and  r^+  of  the  confidence  interval  and  upper  confidence  limit  r^up)  for 
various  values  of  rg  and  various  very  small  values  of  k  for  a  fixed  number  of  reference  events 
d  and  fixed  confidence  level  of  90  %. 


II 

o 

V! O 

o 

CL 

II 

h— ^ 

o 

k 

rB 

rs- 

rS+ 

r(up) 

rS 

0 

0.0 

0.000 

0.349 

0.259 

0 

0.1 

0.000 

0.349 

0.259 

0 

0.2 

0.000 

0.349 

0.259 

1 

0.0 

0.005 

0.573 

0.450 

1 

0.1 

0.005 

0.502 

0.382 

1 

0.2 

0.005 

0.464 

0.348 

2 

0.0 

0.034 

0.780 

0.627 

2 

0.1 

0.010 

0.686 

0.535 

2 

0.2 

0.007 

0.613 

0.467 

3 

0.0 

0.077 

0.979 

0.799 

3 

0.1 

0.020 

0.880 

0.701 

3 

0.2 

0.012 

0.788 

0.612 

4 

0.0 

0.127 

1.174 

0.968 

4 

0.1 

0.044 

1.074 

0.869 

4 

0.2 

0.019 

0.976 

0.771 

5 

0.0 

0.180 

1.367 

1.135 

5 

0.1 

0.085 

1.267 

1.035 

5 

0.2 

0.034 

1.167 

0.936 

6.12  Java  Classes  and  Example  Programs 

Java  Classes  Referring  to  Samples 

Sample  contains  methods  computing  characteristic  parameters  of  a  sample: 
mean,  variance  and  standard  deviation  as  well  as  the  errors  of  these 
quantities. 

SmallSamle  contains  methods  computing  the  confidence  limits  for  small 
samples. 


Histograms  allows  the  construction  and  administration  of  a  histogram. 
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Example  Program  6.1:  The  class  El  Sample  demonstrates  the  use  of  the 
class  Sampl  e 

This  short  program  generates  a  sample  of  size  N  taken  from  the  standard  normal 
distribution.  It  computes  the  six  quantities:  mean,  error  of  the  mean,  variance,  error 
of  the  variance,  standard  deviation,  and  error  of  the  standard  deviation,  and  outputs 
each  quantity  in  a  single  line. 

Example  Program  6.2:  The  class  E2Sample  demonstrates  the  use  of  the 
classes  Histogram  and  GraphicsWithHistogarm 

Initialization,  filling  and  graphical  representation  of  a  histogram  are  demonstrated  for 
a  sample  of  N  elements  from  the  standardized  normal  distribution.  Interactive  input 
is  provided  for  N  as  well  as  for  the  lower  boundary  xq,  bin  width  Ax  and  the  number 
of  bins  nx  of  the  histogram.  The  histogram  is  initialized,  the  sample  elements  are 
generated  and  entered  into  the  histogram.  Finally  the  histogram  graphics  is  produced. 

Example  Program  6.3:  The  class  E3Sample  demonstrates  the  use  of 
class  GraphicsWith2DScatterDiagrams 

A  scatter  plot  is  created  and  later  displayed  graphically.  The  coordinates  of  the  points 
making  up  the  scatter  plot  are  given  as  pairs  of  random  numbers  from  a  bivariate 
normal  distribution  (cf.  Sect.  4.10).  The  program  asks  for  the  parameters  of  the  nor¬ 
mal  distribution  (means  a\,  <22 ,  standard  deviations  oq,  oq,  correlation  coefficient  p) 
and  for  the  number  of  random  number  pairs  to  be  generated.  It  generates  the  pairs 
and  prepares  the  caption  and  the  labeling  of  axes  and  scales  and  displays  the  plot 
(Fig.  6.12). 

Example  Program  6.4:  The  class  E4Sample  demonstrates  using  the 
methods  of  the  class  SmallSample  to  compute  confidence  limits 

The  program  computes  the  limits  A^_,  A^+,  and  A^up)  for  the  Poisson  parameter  of 
a  signal.  The  user  enters  interactively  the  number  of  observed  events  k,  the  confi¬ 
dence  level  p  =  1  —a,  and  the  Poisson  parameter  XB  of  the  background.  Sugges¬ 
tions:  (a)  Verify  a  few  lines  from  Table  6.3. 

(b)  Choose  ft  =  0.683,  XB  —  0  and  compare  for  different  values  k  the  values  and 
\s+  with  the  naive  statement  A  =  k=b  Vk. 

Example  Program  6.5:  The  class  E5Sample  demonstrates  the  use  of 

methods  of  the  class  SmallSample  to  compute  confidence  limits  of 
ratios 

The  program  computes  the  limits  rs _,  rB+,  and  r^up)  for  the  ratio  r  of  the  number  of 
signal  to  reference  events  in  the  limit  of  a  large  number  of  events.  More  precisely, 
r  is  the  ratio  of  the  Poisson  parameters  Xs  to  XR  of  the  signal  to  reference  events. 
The  program  asks  interactively  for  the  number  k  of  observed  (signal  plus  reference) 
events,  for  the  number  d  of  reference  events,  for  the  confidence  level  ft  =  1  —  a,  and 
for  the  expected  ratio  rB  =  XB/XR  for  background  events. 

Suggestion:  Verify  a  few  lines  from  Table  6.4. 
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Fig.  6.12:  Pairs  of  random  numbers  taken  from  a  bivariate  normal  distribution. 


Example  Program  6.6:  The  class  E6Sample  simulates  experiments 
with  few  events  and  background 

A  total  of  nQX p  experiments  are  simulated.  In  each  experiment  N  objects  are  analyzed. 
Each  object  yields  with  probability  ps  =  ks/N  a  signal  event  and  with  probability 
pB  —  kB/N  a  background  event.  The  numbers  of  events  found  in  the  simulated  ex¬ 
periment  are  ks ,  k#,  and  k  =  k$  +  kB.  In  the  real  experiment  only  k  is  known.  The 
limits  XS-,  and  kfv)  for  k  are  computed  for  a  given  confidence  level  /3  =  1  —  a 
and  a  given  value  of  kB  and  are  displayed  for  each  experiment. 

Suggestion:  Choose,  e.g.,  /2exp  =  20,  N  =  1000,  ks  =  5,  kB  =  2,  ft  =  0.9  and 
find  out  whether,  as  expected,  this  simulation  yields  for  10%  of  the  experiments  an 
interval  (ks~,  A.s+),  which  does  not  contain  the  value  ks.  Keep  in  mind  the  meaning 
of  the  statistical  error  of  your  observation  when  using  only  20  experiments. 

Example  Program  6.7:  The  class  E7Sample  simulates  experiments 
with  few  signal  events  and  with  reference  events 

The  program  asks  interactively  for  the  quantities  nQxp,  N,  ks,  kB,  kR,  and  the  con¬ 
fidence  level  p  =  1  —  a.  It  computes  the  probabilities  ps  =  ks/N ,  pB  =  kB/N , 
pR  =  kR/N ,  as  well  as  the  ratios  rs  —  Ps/ Pr  and  rB  =  Pb/Pr  of  a  total  of  nexp 
simulated  experiments,  in  each  of  which  N  objects  are  analyzed.  Each  object  is  taken 
to  be  a  signal  event  with  probability  ps ,  a  background  event  with  probability  pB , 
and  a  reference  event  with  probability  pR.  (Here  ps  +  Pb  +  Pr  <SC  1  is  assumed.) 
The  simulation  yields  the  numbers  ks ,  kB ,  and  d  for  signal,  background,  and  refer- 
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ence  events,  respectively.  In  a  real  experiment  only  the  numbers  k  =  Ws  +  k#  and  d 
are  known,  since  signal  and  background  events  cannot  be  distinguished.  For  the  given 
values  of  /3  and  rB  and  for  the  quantities  k  and  d  which  were  found  in  the  simulated 
experiments,  the  limits  rs~,  r$+,  and  r^up)  are  computed  and  displayed. 

Suggestion:  Modify  the  suggestion  accompanying  Example  Program  6.6  by 
choosing  an  additional  input  parameter  Xs  =  20.  Find  out  in  how  many  cases  the 
true  value  rs  used  in  the  simulation  is  not  contained  in  the  interval  (rs~,  r$+). 


7.  The  Method  of  Maximum  Likelihood 


7.1  Likelihood  Ratio:  Likelihood  Function 


In  the  last  chapter  we  introduced  the  concept  of  parameter  estimation.  We 
have  also  described  the  desirable  properties  of  estimators,  though  without 
specifying  how  such  estimators  can  be  constructed  in  a  particular  case.  We 
have  derived  estimators  only  for  the  important  quantities  expectation  value 
and  variance.  We  now  take  on  the  general  problem. 

In  order  to  specify  explicitly  the  parameters 

k  =  (kj ,  A.2,  •  •  • ,  kp)  , 

we  now  write  the  probability  density  of  the  random  variables 

X  =  (Xi ,  X2,  •  •  • ,  x„) 


in  the  form 

/  =  /(x;k)  .  (7.1.1) 


If  we  now  carry  out  a  certain  number  of  experiments,  say  N,  or  we  draw 
a  sample  of  size  N  from  a  population,  then  we  can  give  a  number  to  each 
experiment  j: 

dP(j)  =  /(x°');k)dx  .  (7.1.2) 


The  number  dP(j>  has  the  character  of  an  a  posteriori  probability,  i.e.,  given 
after  the  experiment,  how  probable  it  was  to  find  the  result  x(/)  (within  a  small 
interval).  The  total  probability  to  find  exactly  all  of  the  events 


(AO 


is  then  the  product 


N 

dP  —  Y [  k)  dx 

j= 1 


(7.1.3) 


S.  Brandt,  Data  Analysis:  Statistical  and  Computational  Methods  for  Scientists  and  Engineers , 
DOI  10.1007/978-3-319-03762-2 _ 7,  ©  Springer  International  Publishing  Switzerland  2014 
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This  probability  still  clearly  depends  on  X.  There  are  cases  where  the  popula¬ 
tion  is  determined  by  only  two  possible  sets  of  parameters,  \\  and  X2.  Such 
cases  occur,  for  example,  in  nuclear  physics,  where  the  parity  of  a  state  is 
necessarily  “even”  or  “odd”.  One  can  construct  the  ratio 

N 

n/(x(^i) 

Q  =  ^ -  (7.1.4) 

PI  /  (X(7} ;  X2) 

;=i 

and  say  that  the  values  \\  are  “ Q  times  more  probable”  than  the  values  X2. 
This  factor  is  called  the  likelihood  ratio.* 

A  product  of  the  form 


N 

L  =  Uf^)  (7-1.5) 

7=1 

is  called  a  likelihood  function.  One  must  clearly  distinguish  between  a  prob¬ 
ability  density  and  a  likelihood  function,  which  is  a  function  of  a  sample  and 
is  hence  a  random  variable.  In  particular,  the  a  posteriori  nature  of  the  proba¬ 
bility  in  (7.1.5)  is  of  significance  in  many  discussions. 

Example  7.1:  Likelihood  ratio 

Suppose  one  wishes  to  decide  whether  a  coin  belongs  to  type  A  or  B  by 
means  of  a  number  of  tosses.  The  coins  in  question  are  asymmetric  in  such  a 
way  that  A  shows  heads  with  a  probability  of  1/3,  and  B  shows  heads  with  a 
probability  of  2/3. 


A  B 

Heads  1/3  2/3 

Tails  2/3  1/3 

If  an  experiment  yields  heads  once  and  tails  four  times,  then  one  has 
La  =  3  •  (3 )4  and  LB  =  \  •  (p4, 


One  would  therefore  tend  towards  the  position  that  the  coin  is  of  type  A.  ■ 

*  Although  the  likelihood  ratio  Q  and  the  likelihood  functions  L  and  i  introduced  below 
are  random  variables,  since  they  are  functions  of  a  sample,  we  do  not  write  them  here  with  a 
special  character  type. 
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7.2  The  Method  of  Maximum  Likelihood 

The  generalization  of  the  likelihood  ratio  is  now  clear.  One  gives  the  great¬ 
est  confidence  to  that  choice  of  the  parameters  k  for  which  the  likelihood 
function  (7.1.5)  is  a  maximum.  Figure  7.1  illustrates  the  situation  for  various 
forms  of  the  likelihood  function  for  the  case  of  a  single  parameter  X. 

The  maximum  can  be  located  simply  by  setting  the  first  derivative  of 
the  likelihood  function  with  respect  to  the  parameter  A equal  to  zero.  The 
derivative  of  a  product  with  many  factors  is,  however,  unpleasant  to  deal  with. 
One  first  constructs  therefore  the  logarithm  of  the  likelihood  function, 

TV 

l  =  lnL  =  J]ln/(x0);k).  (7.2.1) 

7  =  1 

The  function  £  is  also  often  called  the  likelihood  function.  Sometimes  one 
says  explicitly  “log-likelihood  function” .  Clearly  the  maxima  of  (7.2.1)  are 
identical  with  those  of  (7.1.5).  For  the  case  of  a  single  parameter  we  now 
construct 

£'  =  d£/dX  =  0.  (7.2.2) 

The  problem  of  estimating  a  parameter  is  now  reduced  to  solving  this  likeli¬ 
hood  equation.  By  application  of  (7.2.1)  we  can  write 

TV  ,  TV  f,  TV 

£'  =  J2  ^ln/(x0);  V  =  =  (7-2-3) 

7  =  1  ^  7  =  1  7  7  =  1 


LW 


X, 


X 


Fig. 7.1  :  Likelihood  functions. 
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where 


<p(xUh,X)  = 


/(x0);A) 


(7.2.4) 


is  the  logarithmic  derivative  of  the  density  /  with  respect  to  X. 

In  the  general  case  of  p  parameters  the  likelihood  equation  (7.2.2)  is 
replaced  by  the  system  of  p  simultaneous  equations, 


3  £ 
3  Xi 


(7.2.5) 


Example  7.2:  Repeated  measurements  of  differing  accuracy 


If  a  quantity  is  measured  with  different  instruments,  then  the  measurement  er¬ 
rors  are  in  general  different.  The  measurements  x(/)  are  spread  about  the  true 
value  X.  Suppose  the  errors  are  normally  distributed,  so  that  a  measurement 
corresponds  to  obtaining  a  sample  from  a  Gaussian  distribution  with  mean  X 
and  standard  deviation  Oj .  The  a  posteriori  probability  for  a  measured  value 
is  then 


/  (x(;);  X)dx  — 


1 


\j2jnoj 


exp 


(x(-/}  -  X)2 


2a] 


dx. 


From  all  N  measurements  one  obtains  the  likelihood  function 


A): 


(7.2.6) 


with  the  logarithm 


(x(7)  -  X)2 


+  const. 


The  likelihood  equation  thus  becomes 


d  l 
dX 


N 


E 


x(j  )  -  x 


(7.2.7) 


It  has  the  solution 


(7.2.8) 
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Since  d2f/dk2  =  —  'ffcrj2  <  0,  the  solution  is,  in  fact,  a  maximum.  Thus  we 
see  that  we  obtain  the  maximum  likelihood  estimator  as  the  mean  of  the  N 
measurements  weighted  inversely  by  the  variances  of  the  individual  measure¬ 
ments.  ■ 


Example  7.3:  Estimation  of  the  parameter  N  of  the  hypergeometric 
distribution 


As  in  the  example  with  coins  at  the  beginning  of  this  chapter,  sometimes  pa¬ 
rameters  to  be  estimated  can  only  take  on  discrete  values.  In  Example  5.2 
we  indicated  the  possibility  of  estimating  zoological  population  densities  by 
means  of  tagging  and  recapture.  According  to  (5.3.1),  the  probability  to  catch 
exactly  n  fish  of  which  k  are  tagged  out  of  a  pond  with  an  unknown  total  of 
N  fish,  out  of  which  K  are  tagged,  is  given  by 


L(k;n,K,N ) 


We  must  now  find  the  value  of  N  for  which  the  function  L  is  maximum.  For 
this  we  use  the  ratio 


L(k;n,k,  N)  ( N  —  n)(N  —  k ) 
L(k;  n,k,N  —  1)  _  (N -n-  K  +  k)N 


>  1 ,  Nk  <  nK  , 
<  1 ,  Nk  >  nK  . 


The  function  L  is  thus  maximum  when  N  is  the  integer  closest  to  nK / k.  ■ 


7.3  Information  Inequality.  Minimum  Variance 
Estimators.  Sufficient  Estimators 

We  now  want  to  discuss  once  more  the  quality  of  an  estimator.  In  Sect.  6. 1  we 
called  an  estimator  unbiased  if  for  every  sample  the  bias  vanished, 

B(X)  =  E(  S)-k  =  0.  (7.3.1) 

Lack  of  bias  is,  however,  not  the  only  characteristic  required  of  a  “good” 
estimator.  More  importantly  one  should  require  that  the  variance 

<t2(S) 

is  small.  Here  one  must  often  find  a  compromise,  since  there  is  a  connection 
between  B  and  cr2,  described  by  the  information  inequality. 

One  immediately  sees  that  it  is  easy  to  achieve  cr2(S)  =  0  simply  by 
using  a  constant  for  S.  We  consider  an  estimator  S(x(1\  x^2\  . . . ,  x^)  that  is 

'  This  inequality  was  independently  found  by  H.  Cramer,  M.  Frechet,  and  C.  R.  Rao  as 
well  as  by  other  authors.  It  is  also  called  the  Cramer-Rao  or  Frechet  inequality. 
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a  function  of  the  sample  x(1),  x(2\  . . . ,  x^.  According  to  (6.1.3)  and  (6.1.4) 
the  joint  probability  density  of  the  elements  of  the  sample  is 


•  • ,x(N);X )  =  / (x(1) ;  A)  f  (x(2) ;  A)  •  •  •  / (x(A° ;  A) . 


The  expectation  value  of  S  is  thus 


S(x 


(i) 


x 


(AO 


)/(*<*>;  A)--- A) 


(■ N ). 


X 


dx1-^ dx(2)  •  •  -dx^. 


(7.3.2) 


According  to  (7.3.1),  however,  one  also  has 

E(  S)  =  5(A)  +  A. 

We  now  assume  that  we  can  differentiate  with  respect  to  A  in  the  integral.  We 
then  obtain 


1  +  5'(A)  = 


f'(xW;k) 

/(xO>;A) 


/(x(1);  A)  •  •  •  f(x(iy>;  A)dxu;  •  •  -dx 


(AO. 


(1) 


(AO 


which  is  equivalent  to 


1  +  B'( A)  -  E 


f'(xW;  A) 
/(i°');  A) 


*  1 

S^^(x0);  A)  j  . 


From  (7.2.3)  we  have 


and  therefore 
One  clearly  has 


N 

£'  —  y^y(X(/  );  A) 
7=1 

l  +  fl'(A)  =  £{Sf'}. 


(7.3.3) 


If  we  also  compute  the  derivative  with  respect  to  A,  we  obtain 


/ 


XJ  /(^ota)/(x(1):  x)  •  •  • /(X  W;  A)dx(1)  •  •  =  0 
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By  multiplying  this  equation  by  E (S)  and  subtracting  the  result  of  (7.3.3)  one 
obtains 


1  +  B'(X)  =  E{Si'}  -  E(S)E(£')  =  E{[ S  -  JE(S)]£/}. 


(7.3.4) 


In  order  to  see  the  significance  of  this  expression,  we  need  to  use  the  Cauchy- 
Schwarz  inequality  in  the  following  form: 

If  x  and  y  are  random  variables  and  if  x2  and  y2  have  finite 
expectation  values,  then 

(£(xy)}2  <  E(x2)E( y2).  (7.3.5) 


^((ax  +  y)2)  =  a2£’(x2)  +  2a£’(xy)4-£’(y2)  >  0. 


To  prove  this  inequality  we  consider  the  expression 

(7.3.6) 

This  is  a  non-negative  number  for  all  values  of  a.  If  we  consider  for  the  mo¬ 
ment  the  case  of  equality,  then  this  is  a  quadratic  equation  for  a  with  the 
solutions 


E(x y)  ^ 

«1,2  = - ± 


fE(x y)\2  E(  y2) 


(7.3.7) 


£(x2)  Y  \E(x2)  J  E (x2) 

The  inequality  (7.3.6)  is  then  valid  for  all  a  if  the  term  under  the  square  root 
is  negative  or  zero.  From  this  follows  the  assertion 


{£(xy)}2  E(  y2) 


{£(x2)}2  E(xl) 

If  we  now  apply  the  inequality  (7.3.5)  to  (7.3.4),  one  obtains 

{\  +  B’{X)}2  <  £{[S-£(S)]2}£(f/2). 


2^v  - 


<0. 


(7.3.8) 


We  now  use  (7.2.3)  in  order  to  rewrite  the  expression  for  E(l'2), 


E{lr2)  =  E 


N 


^(p(X{j)\k) 
7  =  1 


N 


=  E\  J](<p(x(2);A)): 
7=1 


+  E  |  ^2(p(x(i);  X)(p(xU);  X) 
i=Xj 


All  terms  on  the  right-hand  side  vanish,  since  for  i  j 

E{cp{X{i)-  X)cp(x{2)-  A)}  =  E{cp(xCl)- X)}E{<p(X^-,  A)}, 

00  f'(x;  A) 


E{(p(x\  A)}  = 


/ 


-00  f(x\X) 


/ 


fix\  A)cbc=  /  f(x;X)dx, 
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and 

/oo 

f(x ;  A)dx  =  1. 

-00 

By  differentiating  the  last  line  with  respect  to  k  one  obtains 

/oo 

f'(x ;  A)dx  =  0. 

-oo 


Thus  one  has  simply 


E(£'2)  =  E 


N 

J](<p(x^;A))2  • 


7=1 


Since  the  individual  terms  of  the  sum  are  independent,  the  expectation  value 
of  the  sum  is  simply  the  sum  of  the  expectation  values.  The  individual  ex¬ 
pectation  values  do  not  depend  on  the  elements  of  the  sample.  Therefore 
one  has 


/(A.)  =  E(£'2) 


This  expression  is  called  the  information  of  the  sample  with  respect  to  k.  It 
is  a  non-negative  number,  which  vanishes  if  the  likelihood  function  does  not 
depend  on  the  parameter  k. 

It  is  sometimes  useful  to  write  the  information  in  a  somewhat  different 
form.  To  do  this  we  differentiate  the  expression 


fixfkf 

f(x;k) 


f(x;  A)dx  =  0 


once  more  with  respect  to  k  and  obtain 


0 


\  fdx 


The  information  can  then  be  written  as 


I(k)  =  NE  { 


(  f'(x;k) 

V /(*;*) 


-NE 


(  f'(x;k)\ 
\f(x;k)  ) 


or 


I  (k)  =  E{1'2)  =  -E(£"). 


(7.3.9) 
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The  inequality  (7.3.8)  can  now  be  written  in  the  following  way: 


{l  +  B\X)}2  <a2(S)I(X) 


or 


<t2(S)  > 


U  +  W)}2 

I(X) 


(7.3.10) 


This  is  the  information  inequality.  It  gives  the  connection  between  the  bias 
and  the  variance  of  an  estimator  and  the  information  of  a  sample.  It  should  be 
noted  that  in  its  derivation  no  assumption  about  the  estimator  was  made.  The 
right-hand  side  of  the  inequality  (7.3.10)  is  therefore  a  lower  bound  for  the 
variance  of  an  estimator.  It  is  called  the  minimum  variance  bound  or  Cramer- 
Rao  bound.  In  cases  where  the  bias  does  not  depend  on  X,  i.e.,  particularly  in 
cases  of  vanishing  bias,  the  inequality  (7.3.10)  simplifies  to 


er2(S)  >  1/7 (A.). 


(7.3.11) 


This  relation  justifies  using  the  name  information.  As  the  information  of  a 
sample  increases,  the  variance  of  an  estimator  can  be  made  smaller. 

We  now  ask  under  which  circumstances  the  minimum  variance  bound  is 
attained,  or  explicitly,  when  the  equals  sign  in  the  relation  (7.3.10)  holds.  In 
the  inequality  (7.3.6),  one  has  equality  if  (ax  +  y)  vanishes,  since  only  then 
does  one  have  E{(ax+ y)2}  =  0  for  all  values  of  a,  x,  andy.  Applied  to  (7.3.8), 
this  means  that 

•£'  +  a(S-E(S))  =0 

or 

£'  =  A(k)(S-E(S)).  (7.3.12) 

Here  A  means  an  arbitrary  quantity  that  does  not  depend  on  the  sample 
x(1),  x(2), . . . ,  xl,V),  but  may  be,  however,  a  function  of  X.  By  integration  we 
obtain 

i  =  j  £'dX  =  B(X)S  +  C(X)  +  D  (7.3.13) 

and  finally 

L  —  d  exp{5(A)S  +  C(7.)}.  (7.3.14) 

The  quantities  d  and  D  do  not  depend  on  X. 

We  thus  see  that  estimators  attain  the  minimum  variance  bound  when  the 
likelihood  function  is  of  the  special  form  (7.3.14).  They  are  therefore  called 
minimum  variance  estimators. 

For  the  case  of  an  unbiased  minimum  variance  estimator  we  obtain 
from  (7.3.1 1) 

1  1 


a2(S)  = 


i(x)  E(iay 


(7.3.15) 
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By  substituting  (7.3.12)  one  obtains 


1 

M(a))2£{(S-E(S))2} 


1 

(A(X))2a2(S) 


or 


If  instead  of  (7.3.14)  only  the  weaker  requirement 


L  =  g(S,A)c(x(1),x(2) 


XW) 


(7.3.16) 


(7.3.17) 


holds,  then  the  estimator  S  is  said  to  be  sufficient  for  X.  One  can  show  [see, 
e.g.,  Kendall  and  Stuart,  Vol.  2  (1967)],  that  no  other  estimator  can  con¬ 
tribute  information  about  X  that  is  not  already  contained  in  S,  if  the  require¬ 
ment  (7.3.17)  is  fulfilled.  Hence  the  name  “sufficient  estimator”  (or  statistic). 


Example  7.4:  Estimator  for  the  parameter  of  the  Poisson  distribution 
Consider  the  Poisson  distribution  (5.4.1) 

m  =  £-*■ 

The  likelihood  function  of  a  sample  \d2) , . . . ,  k(,V)  is 

N 

l  =  ^{k0)lnk  — ln(k0)!)  -X) 

7=1 

and  its  derivative  with  respect  to  X  is 


d  i 
dX 


1 

X 


N 


7  =  1 


(7.3.18) 


Comparing  with  (7.3.12)  and  (7.3.16)  shows  that  the  arithmetic  mean  k  is  an 
unbiased  minimum  variance  estimator  with  variance  X/N.  m 
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Example  7.5:  Estimator  for  the  parameter  of  the  binomial  distribution 

The  likelihood  function  of  a  sample  from  a  binomial  distribution  with  the 
parameters  p  =  X,  q  =  l  —  A  is  given  directly  by  (5.1.3), 

L(k,X)=  (  "  \  Ak(l-A)w-k. 

(The  result  of  the  sample  can  be  summarized  by  the  statement  that  in  n  exper¬ 
iments,  the  event  A  occurred  k  times;  see  Sect.  5.1.)  One  then  has 


£  =  InL  =  kin  A.  +  (n  —  k)ln(l  —  A.)  +  ln  ^  ^  J  , 

.  k  n  —  k  n  /k 

£'  = - = -  --A 

A  1  —  A  A(1  —  A)  \n 


By  comparing  with  (7.3.12)  and  (7.3.16)  one  finds  k/n  to  be  a  minimum  vari¬ 
ance  estimator  with  variance  A(l  —  A )/n.m 


Example  7.6:  Law  of  error  combination  (“Quadratic  averaging  of  individual 
errors”) 

We  now  return  to  the  problem  of  Example  7.2  of  repeated  measurements  of 
the  same  quantity  with  varying  uncertainties,  or  expressed  in  another  way, 
to  the  problem  of  drawing  a  sample  from  normal  distributions  with  the  same 
mean  A  and  different  but  known  variances  oj.  From  (7.2.7)  we  obtain 


d£ 

dA 


X(j  >  -  A 


We  can  rewrite  this  expression  as 


As  in  Example  7.2  we  recognize 


S  =  A  = 


(7.3.19) 
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as  an  unbiased  estimator  for  A.  Comparing  with  (7.3.12)  shows  that  it  is 
also  a  minimum  variance  estimator.  From  (7.3.16)  one  determines  that  its 
variance  is 

<x2(k)  =  '  (7’3'20) 

The  relation  (7.3.20)  often  goes  by  the  name  of  the  law  of  error  combination 
or  quadratic  averaging  of  individual  errors.  It  could  have  been  obtained  by 
application  of  the  rule  of  error  propagation  (3.8.7)  to  (7.3.19).  If  we  identify 

/■w 

o  (A)  as  the  error  of  the  estimator  A  and  Oj  as  the  error  of  the  j th  measurement, 
then  we  can  write  it  in  its  usual  form 


(7.3.21) 


If  all  of  the  measurements  have  the  same  error  o  =  aj,  Eqs.  (7.3.19),  (7.3.20) 
simplify  to 

A  =  X,  cr2  (X)  =  o2  /  n , 
which  we  have  already  found  in  Sect.  6.2.  ■ 


7.4  Asymptotic  Properties  of  the  Likelihood  Function 
and  Maximum-Likelihood  Estimators 

We  can  now  show  heuristically  several  important  properties  of  the  likelihood 
function  and  maximum-likelihood  estimators  for  very  large  data  samples,  that 
is,  for  the  limit  N  — »■  oo.  The  estimator  S  =  A  was  defined  as  the  solution  to 
the  likelihood  equation 


Let  us  assume  that  C(A)  can  be  differentiated  with  respect  to  A  one  more  time, 
so  that  we  can  expand  it  in  a  series  around  the  point  A  =  A, 

C(A)  =  l'(k)  +  (A  - 1)1" {l)  +  •  •  • .  (7.4.2) 

The  first  term  on  the  right  side  vanishes  because  of  Eq.  (7.4.1).  In  the  second 
term  one  can  write  explicitly 


/ 

X 
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This  expression  has  the  form  of  the  mean  value  of  a  sample.  For  very  large  N 
it  can  be  replaced  by  the  expectation  value  of  the  population  (Sect.  6.2), 


g"(X)  =  NE  - 


/  f'(x;k) 
\f(x\X) 


Using  Eq.  (7.3.9)  we  can  now  write 

l"(i)  =  E(t"(X))  =  -E(g,2(X))  =  -1(1)  =  -l/b2. 


(7.4.3) 


(7.4.4) 


In  this  way  we  can  replace  the  expression  for  t"( X),  which  is  a  function  the 
sample  x(1),  x(2), . . . ,  xl/V),  by  the  quantity  —l/b2,  which  only  depends  on  the 
probability  density  /  and  the  estimator  X.  If  one  neglects  higher-order  terms, 
Eq.  (7.4.2)  can  be  expressed  as 

.  1 

i'(X)  =  — y(A-A).  (7.4.5) 

bl 

By  integration  one  obtains 

g(X)  =  —  ^2  ^  _  X)2  -\-c. 

Inserting  X  —  X  gives  c  —  g(X),  or 

1(A)- 1(1)  =  ~(A~A)2.  (7.4.6) 

By  exponentiation  one  obtains 

L(X)  =kcxp{-(X-X)2/2b2},  (7.4.7) 


where  A:  is  a  constant.  The  likelihood  function  L  (X)  has  the  form  of  a  normal 
distribution  with  mean  X  and  variance  b  .  At  the  values  X  —  X  ±  b,  where  X  is 
one  standard  deviation  from  X,  one  has 

-(f(A)-£(A))  =  i.  (7.4.8) 

We  can  now  compare  (7.4.7)  with  Eqs.(7.3.12)  and  (7.3.16).  Since  we 
are  estimating  the  parameter  X,  we  must  write  S  =  X  and  thus  E( S)  =  X. 
The  estimator  X  is  therefore  an  unbiased  minimum  variance  estimator  with 
variance 

a2(X)  =b2  =  -X-  = - J—  = - (7.4.9) 

I(X)  E(i'2(X))  E(t"(X)) 

Since  the  estimator  X  only  possesses  this  property  for  the  limiting  case  N  — > 
oo,  we  call  it  asymptotically  unbiased.  This  is  equivalent  to  the  statement 
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that  the  maximum  likelihood  estimator  is  consistent  (Sect.  6.1).  For  the  same 
reason  the  likelihood  function  is  called  asymptotically  normal. 

In  Sect.  7.2  we  interpreted  the  likelihood  function  L(A)  as  a  measure  of 
the  probability  that  the  true  value  Ao  of  a  parameter  is  equal  to  A.  The  result 
of  an  estimator  is  often  represented  in  abbreviated  form, 

A  =  A  zb  <7 (A)  =  A  zb  AX. 

Since  the  likelihood  function  is  asymptotically  normal,  at  least  in  the  case  of 
large  samples,  i.e.,  many  measurements,  this  can  be  interpreted  by  saying  that 
the  probability  that  the  true  value  Ao  lies  in  the  interval 

A  —  AX  <z  Aq  <z  A  ~b  AX 

is  68.3  %  (Sect.  5.8).  In  practice  the  relation  above  is  used  for  large  but  finite 
samples.  Unfortunately  one  cannot  construct  any  general  rule  for  determining 
when  a  sample  is  large  enough  for  this  procedure  to  be  reliable.  Clearly  if  N 
is  finite,  (7.4.3)  is  only  an  approximation,  whose  accuracy  depends  not  only 
on  N ,  but  also  on  the  particular  probability  density  f(x;  A). 


Example  7.7:  Determination  of  the  mean  lifetime  from  a  small  number 
of  decays 

The  probability  that  a  radioactive  nucleus,  which  exists  at  time  t  =  0,  decays 
in  the  time  interval  between  t  and  t  +  dt  is 

1 

f(t)dt  —  —  exp(— t/x)dt. 
x 

For  observed  decay  times  t\,  ti, ....  In  the  likelihood  function  is 


Its  logarithm  is 

N. 

l  =  InL  = - 1  —  N\nx 

x 

with  the  derivative 


r 


-y{t-x). 

XL 


Comparing  with  (7.3.12)  we  see  that  f  =  t  is  the  maximum  likelihood  so¬ 
lution,  which  has  a  variance  of  cr2(r)  =  x2/N .  For  r  =  f  =  t  one  obtains 

Ax  =  t/VN. 
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For  r  =  t  one  has 


£(r)  =  £max  =  — 7V(1  +  lnf). 

We  can  write 

-  (£( r)  - £(f))  =  N  (*-  +  In l  -  1 V 

From  this  expression  for  the  log-likelihood  function  one  cannot  so  easily  rec¬ 
ognize  the  asymptotic  form  (7.4.6)  for  N  — oo.  For  small  values  of  N  it 
clearly  does  not  have  this  form.  Corresponding  to  (7.4.8),  we  want  to  use  the 
values  t_|_  =  f  +  A+  and  r_  =  f  —  A- ,  where  one  has 

-  (£(r±)  -  £(i))  =  ^ 

for  the  asymmetric  errors  A+,  A-.  Clearly  we  expect  for  N  — ►  oo  that  A+, 
A-  -+  At  =ff(f). 

In  Fig.  7.2  the  N  observed  decay  times  6  are  marked  as  vertical  tick  marks 
on  the  abscissa  for  various  small  values  of  N.  In  addition  the  function  —  (£  — 
fmax)  =  —(i(x)  —£(f))  is  plotted.  The  points  r+  and  r_  are  found  where 
a  horizontal  line  intersects  —(£  —  £max)  =  1/2.  The  point  f  is  indicated  by 
an  additional  mark  on  the  horizontal  line.  One  sees  that  with  increasing  N 
the  function  —(£  —  £  max)  approaches  more  and  more  the  symmetric  parabolic 
form  and  that  the  errors  A+,  A-,  and  At  become  closer  to  each  other.  ■ 

7.5  Simultaneous  Estimation  of  Several  Parameters: 
Confidence  Intervals 

We  have  already  given  a  system  of  equations  (7.2.5)  allowing  the  simulta¬ 
neous  determination  of  p  parameters  A  =  ,  >0 . . Xp ).  It  turns  out  that 

it  is  not  the  parameter  determination  but  rather  the  estimation  of  their  errors 
that  becomes  significantly  more  complicated  in  the  case  of  several  parame¬ 
ters.  In  particular  we  will  need  to  consider  correlations  as  well  as  errors  of  the 
parameters. 

We  extend  our  considerations  from  Sect.  7.4  on  the  properties  of  the  like¬ 
lihood  function  to  the  case  of  several  parameters.  The  log-likelihood  function 

N 

f(x(1),x(2),...,x(iV);k)  =  ^ln /(x0);X)  (7.5.1) 

7=1 

can  be  expanded  in  a  series  about  the  point 


k  =  (Ai,  A.2, . .  • ,  A p) 


(7.5.2) 
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2,  t  =  1.57 


A_  =  0.73,  A+  =  1.99,  Ax  =  1.11 


A 


A_  =  0.46,  A+  =  0.73,  Ax  =  0.57 


a-i 

A 


max' 


A_  = 


0.13,  A+  =  0.17,  Ax  = 

0.6 


0.15 


^  ^max^ 

A 


0.5 

0.4 

0.3 

0.2 

0.1 

0 


2 


3  4 

->  t.x 


5 


N  =  5,  t  =  1.80 

A_  =  0.60,  A+  =  1.15,  Ax  = 


0.80 


^  Lax^ 
A 


A_  =  0.26,  A+  =  0.36,  Ax  =  0.30 


a-i 

A 


max' 


A_  = 


0.09,  A+  =  0.11,  Ax  = 

0.6 


0.10 


^  ^max^ 

A 


0.5 

0.4 

0.3 

0.2 

0.1 

0 


0 


2 


3  4 

->  t.x 


5 


Fig.  7.2:  Data  and  log-likelihood  function  for  Example  7.7. 


to  give 


p 


=  a  i)  +  J2 


dt 


k=  1 
P  P 


)  ( —  Xfc) 

dxJi 


d2e  , 

+  T  /  /  I  TT — ~ J  (7-£  —  kl)(Xm  ~  7-m)  + 

■m  /  \ 


2  1  dXidX, 

1= 1  m=l 


(7.5.3) 


Since  by  the  definition  of  \  one  has 


U 
37.  k 


=  0, 


k  —  1, 2, . . . ,  p, 


(7.5.4) 
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which  holds  for  all  k,  the  series  simplifies  to 


f  (X)  - £(\) )  =  i(X  -  X)TA(X  -  X)  + 


•  •  • 


(7.5.5) 


with 


—  A  = 


3  2i 

d2i 

3  2l 

dk2 

dX\ 3A-2 

dX\dX 

d2l 

3  2i 

d2i 

dX\d^2 

• 

»x\ 

3A-23A 

• 

3  2i 

3  2l 

d2l 

dX\dXp 

3^2 dXp 

"  3  kj 

P 


\ 


/  x=x 


(7.5.6) 


In  the  limit  N  — ►  oo  we  can  replace  the  elements  of  A,  which  still  depend  on 
the  specific  sample,  by  the  corresponding  expectation  values, 


B  = 


£( 


3  2i 


37.1 37.2 


/  d2£ 

E  - 

^  \3Ai3 Xp 


(  d2l 

E  ( - 

\dX\dX2 


(  d2t 

E  (  - 

\dk2dkp 


(  d2l 

E  ( - 

\dX\dXp 


Ei 


3  2i 


37-237.  p 


3  ll 
dk2 


\ 


(7.5.7) 


/  x=x 


If  we  neglect  higher-order  terms,  we  can  give  the  likelihood  function  as 


j 

L  —  k  exp{ — CX.  —  X)T5(X  —  X)}. 

2' 


(7.5.8) 


Comparing  with  (5.10.1)  shows  that  this  is  a  p-dimensional  normal  distribu- 
tion  with  mean  X  and  covariance  matrix 


C  —  .  (7.5.9) 

The  variances  of  the  maximum  likelihood  estimators  T-i,  7.2, . . . ,  kp  are  given 
by  the  diagonal  elements  of  the  matrix  (7.5.9).  The  off-diagonal  elements  are 
the  covariances  between  all  possible  pairs  of  estimators, 
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o2(ki)=cu ,  (7.5.10) 

co  v(Xj,\k)  =  cjk.  (7.5.11) 

For  the  correlation  coefficient  between  the  estimators  kj ,  kk  we  can  define 


CO \(kj,kk) 

a(kj)o(kk) 


(7.5.12) 


As  in  the  case  of  a  single  parameter,  the  square  roots  of  the  variances  are  given 
as  the  error  or  standard  deviations  of  the  estimators, 


AA;  —  o  (ki)  —  Cj  / 


(7.5.13) 


In  Sect.  7.4  we  determined  that  by  giving  the  maximum- likelihood  estimator 
and  its  error  one  defines  a  region  that  contains  the  true  value  of  the  parame¬ 
ter  with  a  probability  of  68.3%.  Since  the  likelihood  function  in  the  several 
parameter  case  is  asymptotically  a  Gaussian  distribution  of  several  variables, 
this  region  is  not  determined  only  by  the  errors,  but  rather  by  the  entire  co- 
variance  matrix.  In  the  special  case  of  two  parameters  this  is  the  covariance 
ellipse,  which  we  introduced  in  Sect.  5.10. 

The  expression  (7.5.8)  has  (with  the  replacement  x  =  \)  exactly  the  form 
of  (5.10.1).  We  can  therefore  use  it  for  all  of  the  results  of  Sect.  5.10.  For  the 
exponent  one  has 


-^(k-l)T5(k-i)  = 


g(k)  =  -{f00-f(l)}. 


(7.5.14) 


In  the  parameter  space  spanned  by  ki, kp,  the  covariance  ellipsoid  of  the 
distribution  (7.5.8)  is  then  determined  by  the  condition 


g(k)  =  1  =2{f00-f(i)}.  (7.5.15) 


For  other  values  of  g(\)  one  obtains  the  confidence  ellipsoids  introduced  in 
Sect.  5.10.  For  smaller  values  of  N,  the  series  (7.5.3)  cannot  be  truncated  and 
the  approximation  (7.5.7)  is  not  valid.  Nevertheless,  the  solution  (7.5.4)  can 
clearly  still  be  computed.  For  a  given  probability  W  one  obtains  instead  of  a 
confidence  ellipsoid  a  confidence  region ,  contained  within  the  hypersurface 


g(X)  —  2  li  (L)  —  l  (X)  J  =  const. 


(7.5.16) 


The  value  of  g  is  determined  in  the  same  way  as  for  the  confidence  ellipsoid 
as  in  Sect.  5.10. 

In  Example  7.7  we  computed  the  region  k  —  <  k  <  A  4-  A+  for  the 

case  of  a  single  variable.  This  corresponds  to  a  confidence  region  with  the 
probability  content  68.3  %. 
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Example  7.8:  Estimation  of  the  mean  and  variance  of  a  normal  distribution 

We  want  to  determine  the  mean  A.]  and  standard  deviation  A  2  of  a  normal 
distribution  using  a  sample  of  size  N.  This  problem  occurs,  for  example,  in  the 
measurement  of  the  range  of  a -particles  in  matter.  Because  of  the  statistical 
nature  of  the  energy  loss  through  a  large  number  of  independent  individual 
collisions,  the  range  of  the  individual  particles  is  Gaussian  distributed  about 
some  mean  value.  By  measuring  the  range  x(/)  of  N  different  particles,  the 
mean  k\  and  “straggling  constant”  A  2  =  o  can  be  estimated.  We  obtain  the 
likelihood  function 


.  A  1  /  (xw-m2\ 

— Til — / 


and 


iJ^(xO-)-Ai)2 

J  =  1  2 


The  system  of  likelihood  equations  is 


A^lnA2  —  const. 


dt 

3Ai 

U 

3^2 


Its  solution  is 


1 

N 


N 


E(x°'}-M)2 


For  the  estimator  of  the  mean,  the  maximum-likelihood  method  leads  to  the 
arithmetic  mean  of  the  individual  measurements.  For  the  variance  it  gives  the 
quantity  s/2  (6.2.4),  which  has  a  small  bias,  and  not  S2,  the  unbiased  estima¬ 
tor  (6.2.6). 

Let  us  now  determine  the  matrix  B.  The  second  derivatives  are 


3  2i 
3A2 


N 

A2’ 
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d2i 

dX\d^2 


2E(x°')-A1) 


5 


d2i  _  3E(x(,)-A|)2  (  N 

9^  _  ^2  + 

We  use  the  procedure  of  (7.5.7),  substitute  ki,  k2  by  k\,  X2  and  find 


/  N/X2  0 
V  0  2N/X2 


or  for  the  covariance  matrix 


C  =  5_1 


x2/n  0 

0  l2/2N 


We  interpret  the  diagonal  elements  as  the  errors  of  the  corresponding  param¬ 
eters,  i.e., 

AX\  =  X2/ y/N ,  AX  2  =  X2/ V2N. 

The  estimators  for  Ai  and  are  not  correlated.  ■ 


Example  7.9:  Estimators  for  the  parameters  of  a  two-dimensional  normal 
distribution 

To  conclude  we  consider  a  population  described  by  a  two-dimensional  normal 
distribution  (Sect.  5.10) 


/ Cm, *2)  - 


1 


exp 


2izo\02\]  1  —  p2  L  2(1— p2) 


1 


X 


(vi-fli)2  (x2-a2)2 

- o - + - 7 - 2  p 


a 


1 


a- 


(Xl  -ai)(X2-Q2) 
(71(72 


By  constructing  and  solving  a  system  of  five  simultaneous  likelihood  equa¬ 
tions  for  the  five  parameters  a\,a2,a2 ,a2,  q  we  obtain  for  the  maximum- 
likelihood  estimators 
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^(x^-xiXx^ 


ns\s'2 


(7.5.17) 


Exactly  as  in  Example  7.8,  the  estimators  for  the  variances  s'p  and  s^,2  are 
biased.  This  also  holds  for  the  expression  (7.5.17),  the  sample  correlation 
coefficient  r.  Like  all  maximum  likelihood  estimators,  r  is  consistent,  i.e.,  it 
provides  a  good  estimation  of  q  for  very  large  samples.  For  N  — oo  the  prob¬ 
ability  density  of  the  random  variable  r  becomes  a  normal  distribution  with 
mean  q  and  variance 

o-2(r)  =  (l  -q2)2/N.  (7.5.18) 


For  finite  samples  the  distribution  is  asymmetric.  It  is  therefore  important  to 
have  a  sufficiently  large  sample  before  applying  Eq.  (7.5.17).  As  a  rule  of 
thumb,  N  >  500  is  usually  recommended.  ■ 


7.6  Example  Programs 

Example  Program  7.1:  The  class  ElMaxLife  computes  the  mean  lifetime 
and  its  asymmetric  errors  from  a  small  number  of  radioactive  decays 

The  program  performs  the  computations  and  the  graphical  display  for  the  problem 
described  in  Example  7.7.  First  by  Monte  Carlo  method  a  total  of  N  decay  times  t[ 
of  radioactive  nuclei  with  a  mean  lifetime  of  r  =  1  are  simulated.  The  number  N  of 
decays  and  also  the  seeds  for  the  random  number  generator  are  entered  interactively. 

Example  Program  7.2:  The  class  E2MaxLife  computes 

the  maximum- likelihood  estimates  of  the  parameters  of  a  bivariate 
normal  distribution  from  a  simulated  sample 

The  program  asks  interactively  for  the  number  nQxp  of  experiments  to  simulate  (i.e., 
of  the  samples  to  be  treated  consecutively),  for  the  size  np{  of  each  sample  and  for 
the  means  m,  the  standard  deviations  a\ ,  <r2,  and  the  correlation  coefficient  p  of  a 
bivariate  Gaussian  distribution. 

The  covariance  matrix  C  of  the  normal  distribution  is  calculated  and  the  genera¬ 
tor  of  random  numbers  from  a  multivariate  Gaussian  distribution  is  initialized.  Each 
sample  is  generated  and  then  analyzed,  i.e.,  the  quantities  x\,  X2,  s[,  s'2 ,  and  r  are 
computed,  which  are  estimates  of  a\,  <22 ,  oq,  cr 2,  and  p  [cf.  (7.5.17)].  The  quantities 
are  displayed  for  each  sample. 

Suggestions:  Choose  nQX p  =  20,  keep  all  other  parameters  fixed,  and  study  the 
statistical  fluctuations  of  r  for  np{  =  5,  50,  500.  Use  the  values  p  —  0,  0.5,  0.95. 


8.  Testing  Statistical  Hypotheses 


8.1  Introduction 


Often  the  problem  of  a  statistical  analysis  does  not  involve  determining  an 
originally  unknown  parameter,  but  rather,  one  already  has  a  preconceived 
opinion  about  the  value  of  the  parameter,  i.e.,  a  hypothesis.  In  a  sample  taken 
for  quality  control,  for  example,  one  might  initially  assume  that  certain  critical 
values  are  normally  distributed  within  tolerance  levels  around  their  nominal 
values.  One  would  now  like  to  test  this  hypothesis.  To  elucidate  such  test 
procedures,  called  statistical  tests,  we  will  consider  such  an  example  and  for 
simplicity  make  the  hypothesis  that  a  sample  of  size  10  originates  from  a 
standard  normal  distribution. 

Suppose  the  analysis  of  the  sample  resulted  in  the  arithmetic  mean 
x  =  0.154.  Under  the  assumption  that  our  hypothesis  is  correct,  the  random 
variable  x  is  normally  distributed  with  mean  0  and  standard  deviation  -4=. 

We  now  ask  for  the  probability  to  observe  a  value  |x|  >  0.154  from  such  a 
distribution.  From  (5.8.5)  and  Table  1.3  this  is 


P(|x|  >  0.154)  =  2{1  -  iAo(0.154VTO)}  =  0.62 


Thus  we  see  that  even  if  our  hypothesis  is  correct,  there  is  a  probability  of 
62%  that  a  sample  of  size  10  will  lead  to  a  sample  mean  that  differs  from  the 
population  mean  by  0. 154  or  more. 

We  now  find  ourselves  in  the  difficult  situation  of  having  to  answer  the 
simple  question:  “Is  our  hypothesis  true  or  false?”  A  solution  to  this  problem 
is  provided  by  the  concept  of  the  significance  level :  One  specifies  before  the 
test  a  (small)  test  probability  a.  Staying  with  our  previous  example,  if  P(|x|  > 
0. 154)  <  a,  then  one  would  regard  the  occurrence  of  x  =  0. 154  as  improbable. 
That  is,  one  would  say  that  x  differs  significantly  from  the  hypothesized  value 
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and  the  hypothesis  would  be  rejected.  The  converse  is,  however,  not  true.  If  P 
does  not  fall  below  a,  we  cannot  say  that  “the  hypothesis  is  true”,  but  rather  “it 
is  not  contradicted  by  the  result  of  the  sample”.  The  choice  of  the  significance 
level  depends  on  the  problem  being  considered.  For  quality  control  of  pencils 
one  might  be  satisfied  with  1%.  If,  however,  one  wishes  to  determine  insur¬ 
ance  premiums  such  that  the  probability  for  the  company  to  go  bankrupt  is 
less  than  a,  then  one  would  probably  still  regard  0.01%  as  too  high.  In  the 
analysis  of  scientific  data  a  values  of  5,  1,  or  0.1%  are  typically  used.  From 
Table  1.3  we  can  obtain  limiting  values  for  |x|  such  that  a  deviation  in  excess 
of  these  values  corresponds  to  the  given  probabilities.  These  are 

0.05  =  2{1-^o(1-96)}  =  2{1-^o(0.62v/10)}  , 

0.01  =  2{l-^o(2.58)}  =  2{l-iAo(0.82VlO)}  , 

0.001  =  2{1-^o(3.29)}  =  2{1-^o(1-04VTO)}  . 


At  these  significance  levels  the  value  |x|  would  have  to  exceed  the  values  0.62, 
0.82,  1.04  before  we  could  reject  the  hypothesis. 

In  some  cases  the  sign  of  x  is  important.  In  many  production  processes, 
deviations  in  a  positive  and  negative  direction  are  of  different  importance.  (If  a 
baker’s  rolls  are  too  heavy,  this  reduces  profits;  if  they  are  too  light  they  cost 
the  baker  his  license.)  If  one  sets,  e.g., 

P(x  >x'a)<a  , 

then  this  is  referred  to  as  a  one-sided  test  in  contrast  to  the  two-sided  test, 
which  we  have  already  considered  (Fig.  8.1). 


f(x)  f(x) 


Fig.8.1:  One-sided 
and  two-sided  tests 


For  many  tests  one  does  not  construct  the  sample  mean  but  rather  a  certain 
function  of  the  sample  called  a  test  statistic,  which  is  particularly  suited  for 
tests  of  certain  hypotheses.  As  above  one  specifies  a  certain  significance  level 
a  and  determines  a  region  U  in  the  space  of  possible  values  of  the  test  statistic 
T  in  which 

Ph(T  e  U)  —  a  . 

The  index  H  means  that  the  probability  was  computed  under  the  assumption 
that  the  hypothesis  H  is  valid.  One  then  obtains  a  sample,  which  results  in  a 
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particular  value  T'  for  the  test  statistic.  If  T'  is  in  the  region  U ,  the  critical 
region  of  the  test,  then  the  hypothesis  is  rejected. 

In  the  next  sections  we  will  discuss  some  important  tests  in  detail  and 
then  turn  to  a  more  rigorous  treatment  of  test  theory. 

8.2  F-Test  on  Equality  of  Variances 

The  problem  of  comparing  two  variances  occurs  frequently  in  the  development 
of  measurement  techniques  or  production  procedures.  Consider  two  popula¬ 
tions  with  the  same  expectation  value;  e.g.,  one  measures  the  same  quantity 
with  two  different  devices  without  systematic  error.  One  may  then  ask  if  they 
also  have  the  same  variance. 

To  test  this  hypothesis  we  take  samples  of  size  N\  and  N2  from  both 
populations,  which  we  assume  to  be  normally  distributed.  We  construct  the 
sample  variance  (6.2.6)  and  consider  the  ratio 

F  =  s\/s\  .  (8.2.1) 

If  the  hypothesis  is  true,  then  F  will  be  near  unity.  It  is  known  from  Sect.  6.6 
that  for  every  sample  we  can  construct  a  quantity  that  follows  a  ^-distribution: 

(Wi-i)s?  hs\ 


(N2-1)S22  f2s22 

2  2 
a2 

The  two  distributions  have  f\  =  ( TV]  —  1)  and  f2  =  (N2  —  1)  degrees  of 
freedom. 

Under  the  assumption  that  the  hypothesis  (a2  —  o2)  is  true,  one  has 

f  =  . 

/l  xf 

The  probability  density  of  a  /  2-distribution  with  /  degrees  of  freedom  is 
[see  (6.6.10)] 


f(X2)  =  ,  (xV(/-2)e-^  , 

r{\f)  2~2f 

We  now  compute  the  probability 

*We  use  here  the  symbol  W  for  a  distribution  function  in  order  to  avoid  confusion  with 
the  ratio  F. 
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W(Q)  =  P 


that  the  ratio  X\/X\  is  smaller  than  Q: 


1 

W(Q)  = - ; - 

r(\fl)r(±f2)22^+v 


L>o  x2^1_1e_2'cy2/2_1e_2>'(lxdy  . 

'  y>  0 

x/y  >  Q 


Calculating  the  integral  gives 


W(Q)  = 


nhf) 


r(\f\)r(\f2)  Jo 


[  r 2/1  ^t  +  i)  If dt 
Jo 


(8.2.2) 


where 


Finally  if  one  sets 


f  =  fl+f2  ■ 


F  —  Q  fi/fi  , 


then  the  distribution  function  of  the  ratio  F  can  be  obtained  from  (8.2.2), 


W(F)  =  P 


This  is  called  the  Fisher  F-distri tuition.  It  depends  on  the  parameters  f\ 
and  f2.  The  probability  density  for  the  F-distribution  is 


f(F)  = 


h)  r(i/or(t/2)  V  h  1 


5(/l+/2) 


(8.2.3) 


This  is  shown  in  Fig.  8.2  for  fixed  values  of  f\  and  f2.  The  distribution  is 
reminiscent  of  the  y  ^distribution;  it  is  only  non- vanishing  for  F  >  0,  and 
has  a  long  tail  for  F  — ^  00.  Therefore  it  cannot  be  symmetric.  One  can  easily 
show  that  for  f2  >  2  the  expectation  value  is  simply 


E{F)  =  f2/(f2  —  2)  . 

We  can  now  determine  a  limit  F'a  with  the  requirement 


(8.2.4) 


i  This  is  also  often  called  the  v2 -distribution,  co 2 -distribution,  or  Snedecor  distribution. 
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This  expression  means  that  the  limit  F'a  is  equal  to  the  quantile  F\-a  of  the 
F-distribution  (see  Sect.  3.3)  since 


P 


—  1  —  a 


(8.2.5) 


If  this  limit  is  exceeded,  then  we  say  that  af  >  a |  with  the  significance  level  a. 
The  quantiles  F\-a  for  various  pairs  of  values  (/i,  fi)  are  given  in  Table  1.8. 
In  general  one  applies  a  two-sided  test,  i.e.,  one  tests  whether  the  ratio  F  is 
between  two  limits  F"  and  F'” ,  which  are  determined  by 


0  12  3 

- >  F 


o  F 


0  12  3 

- >  F 


o  F 


Fig. 8.2:  Probability  density  of  the  F-distribution  for  fixed  values  of  f\  =  2, 4, . . . ,  20.  For 
/i  =  2  one  has  /(F)  =  e~F .  For  f\  >  2  the  function  has  a  maximum  which  increases  for 
increasing  f\ . 


Because  of  the  definition  of  F  as  a  ratio,  the  inequality 

S?/S2  <  C(/l,  /2) 


(8.2.6) 
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clearly  has  the  same  meaning  as 

si/s?>C(/a,/i) 


Here  the  first  argument  gives  the  number  of  degrees  of  freedom  in  the 
numerator,  and  the  second  in  the  denominator.  The  requirement  (8.2.6)  can 
also  be  written: 

P\^>K(fuh)\  =  l-0i,  P^§>F"(/2,/i)j=^  •  (8.2.7) 

Table  1.8  can  therefore  be  used  for  the  one-sided  as  well  as  the  two-sided 
F-test. 

A  glance  at  Table  1.8  also  shows  that  Fy-a/i  >  1  for  all  reasonable  values 
of  a .  Therefore  one  needs  only  to  find  the  limit  for  the  ratio 

S|/S?>F,_, „(/*,,  A)  .  (8.2.8) 

Here  the  indices  g  and  k  give  the  larger  and  smaller  values  of  the  two 
variances,  i.e.,  s2  >  s%.  If  the  inequality  (8.2.8)  is  satisfied,  then  the  hypothesis 
of  equal  variances  must  be  rejected. 


Example  8.1:  F-test  of  the  hypothesis  of  equal  variance  of  two  series  of 
measurements 

A  standard  length  (100  p  m )  is  measured  using  two  traveling  microscopes.  The 
measurements  and  computations  are  summarized  in  Table  8.1.  From  Table  1.8 
we  find  for  the  two-sided  F-test  with  a  significance  level  of  10%, 

F"1(6,9)  =  Fo.95(6,9)  =  3.37  . 

The  hypothesis  of  equal  variances  cannot  be  rejected.  ■ 


8.3  Student’s  Test:  Comparison  of  Means 

We  now  consider  a  population  that  follows  a  standard  Gaussian  distribution. 
Let  x  be  the  arithmetic  mean  of  a  sample  of  size  N .  According  to  (6.2.3)  the 
variance  of  x  is  related  to  the  population  variance  by 

a2  (X)  =  a2  (X)  /  N  .  (8.3.1) 

If  N  is  sufficiently  large,  then  from  the  Central  Limit  Theorem,  x  will  be 
normally  distributed  with  mean  x  and  variance  cr2(x).  That  is, 
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y  =  (X-x)/cr(x) 


(8.3.2) 


will  be  described  by  a  standard  normal  distribution.  The  quantity  cr(x)  is, 
however,  not  known.  We  only  know  the  estimate  for  er2(x), 


1 


N- 


N 

yX> 

7  =  1 


-x) 


Then  with  (8.3.1)  we  can  also  estimate  cr2(x)  to  be 


1 

N(N  -  1) 


I](X7'-X)2 


(8.3.3) 


(8.3.4) 


We  now  ask  to  what  extent  (8.3.2)  differs  from  the  standard  Gaussian  distribu¬ 
tion  if  rr  (x)  is  replaced  by  Sx.  By  means  of  a  simple  translation  of  coordinates 
we  can  always  have  x  =  0.  We  therefore  only  consider  the  distribution  of 

t  =  x/Sx  =  XVA/V/Sx  .  (8.3.5) 

Since  (N  —  1)  S*  =  /  S*  follows  a  x  2-distribution  with  /  =  N  —  1  degrees  of 
freedom,  we  can  write 


Table 8.1:  F- test  on  the  equality  of  variances.  Data  from  Example  8.1. 


Measurement  with 

Measurement 

Instrument  1 

Instrument  2 

number 

[|im] 

[|im] 

1 

100 

97 

2 

101 

102 

3 

103 

103 

4 

98 

96 

5 

97 

100 

6 

98 

101 

7 

102 

100 

8 

101 

9 

99 

10 

101 

Mean 

100 

99.8 

Degrees  of  freedom 

9 

6 

s2 

34/9  =  3.7 

39/6  =  6.5 

II 

os 

Ln 

• 

II 

h— ^ 

oc 
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t  =  XvW7/X  • 

The  distribution  function  of  t  is  given  by 

Fit)  =  pn  <  t)  =  p  ( <  t 


(8.3.6) 


After  a  somewhat  lengthy  calculation  one  finds 


F(t)  = 


r(ljif  +  i)) 

r(bf)VxVf  J- 


/'( 


t 


2\  — j(/+!) 


1+/ 


dt 


The  corresponding  probability  density  is 


r(i(/  + 1))  /  t2\ 

/(0  =  — -p— — — —  i  +  - 

r(i/)V^v7V  // 


2x-5(/+D 


(8.3.7) 


f 


c>  1 


Fig.  8.3:  Student’s  distribution  fit)  for  /  =  1, 2, . . . ,  10  degrees  of  freedom.  For  f  —  1  the 
maximum  is  lowest  and  the  tails  are  especially  prominent. 

Figure  8.3  shows  the  function  fit )  for  various  degrees  of  freedom 
/  =  N  —  1.  A  comparison  with  Fig.  5.7  shows  that  for  /  — ►  oo,  the  distri¬ 
bution  (8.3.7)  becomes  the  standard  normal  distribution  fo(t),  as  expected. 
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Like  0o  (0 ,  f(t)  is  symmetric  about  0  and  has  a  bell  shape.  Corresponding 
to  (5.8.3)  one  has 

P(\t\ <t)  =2F(\t\)-  1  .  (8.3.8) 


By  requiring 


fta  1 

f  f(t)dt  =  -(l-a) 
o  1 


(8.3.9) 


we  can  again  determine  limits  ±t'a  at  a  given  significance  level  a,  where 


=  h-la 


The  quantiles  t,_i  are  given  in  Table  1.9  for  various  values  of  a  and  /. 

The  application  of  Student’s  tesb  can  be  described  in  the  following  way: 
One  has  a  hypothesis  Ao  for  the  population  mean  of  a  normal  distribution.  A 
sample  of  size  N  yields  the  sample  mean  x  and  sample  variance  S'.  If  the 
inequality 


t 


x-Ao|VW 


>  *a  —  h-ka 


(8.3.10) 


>x 


is  fulfilled  for  a  given  significance  level  a,  then  the  hypothesis  must  be 
rejected. 

This  is  clearly  a  two-sided  test.  If  deviations  only  in  one  direction  are 
important,  then  the  corresponding  test  at  the  significance  level  a  is 


(x±Ao)VA^  ^ 

>  ha  —  h-a 

t’x 


(8.3.11) 


We  can  make  the  test  more  general  and  apply  it  to  the  problem  of  com¬ 
paring  two  mean  values.  Suppose  samples  of  size  N i  and  /V?  have  been  taken 
from  two  populations  X  and  Y.  We  wish  to  find  a  measure  of  correctness  for 
the  hypothesis  that  the  expectation  values  are  equal, 


Because  of  the  Central  Limit  Theorem,  the  mean  values  are  almost  normally 
distributed.  Their  variances  are 

a2(x)  =  -U2(X),  cr2(y)  =  -^cr2(y)  (8.3.12) 

N  i  N2 

and  the  estimates  for  these  quantities  are 


iThe  t -distribution  was  introduced  by  W.  S.  Gosset  and  published  under  the  pseudonym 
Student”. 
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1 

Ni(Ni 


1 

N2(N2-1) 


J](y-y)2 


(8.3.13) 


According  to  the  discussion  in  Example  5.10,  the  difference 

A  =  x-  y  (8.3.14) 

also  has  an  approximate  normal  distribution  with 

a2  (A)  =  cr2(x)  +  cr2(y)  .  (8.3.15) 

A 

If  the  hypothesis  of  equal  means  is  true,  i.e.,  A  =  0,  then  the  ratio 


A/a(A) 


(8.3.16) 


follows  the  standard  normal  distribution.  If  a  (A )  were  known  one  could 
immediately  give  the  probability  according  to  (5.8.2)  for  the  hypothesis  to 
be  fulfilled.  But  only  is  known.  The  corresponding  ratio 


A/sa 


(8.3.17) 


will  in  general  be  somewhat  larger. 

Usually  the  hypothesis  x  =  y  implies  that  x  and  y  come  from  the  same 
population.  Then  er2(x)  and  a2( y)  are  equal,  and  we  can  use  the  weighted 
mean  of  S2  and  Sy  as  the  corresponding  estimator.  The  weights  are  given  by 
(N i-l)  and  (N2-  1): 


s 


2 


(Ai-1)S2  +  (A2-1)S2 
(Afj  —  1)  +  (N2  —  1) 


From  this  we  construct 


and 


—  S-  +  S- 

—  &x  ^  *y 


N\  +  N2  2 

- s 

NxN2 


(8.3.18) 


(8.3.19) 


It  can  be  shown  (see  [8])  that  the  ratio  (8.3.17)  follows  the  Student’s  t- 
distribution  with  f  =  N\  +  N2  —  2  degrees  of  freedom.  With  this  one  can 
now  perform  Student’s  difference  test : 
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The  quantity  (8.3.17)  is  computed  from  the  results  of  two  samples.  This 
value  is  compared  to  a  quantile  of  Student’s  distribution  with  f  =  N\-\-  Ni~2 
degrees  of  freedom  with  a  significance  level  a.  If 


t 


1^1  |x-yl 


Szi 


Sa 


>t'a  =  h-ka 


(8.3.20) 


then  the  hypothesis  of  equal  means  must  be  rejected.  Instead  one  would 
assume  x  >  y  or  x  <  y,  depending  on  whether  one  has  x  >  y  or  x  <  y. 


Example  8.2:  Student’s  test  of  the  hypothesis  of  equal  means  of  two  series 
of  measurements 

Column  x  of  Table  8.2  contains  measured  values  (in  arbitrary  units)  of  the 
concentration  of  neuraminic  acid  in  the  red  blood  cells  of  patients  suffering 
from  a  certain  blood  disease.  Column  y  gives  the  measured  values  for  a  group 
of  healthy  persons.  From  the  mean  values  and  variances  of  the  two  samples 
one  finds 


s 


2 


S 


2 

A 


x-y|  =  1.3  , 

15Sy  T  6Sy 

- - - -  =  9.15 

21 


•> 


For  a  =  5%  and  /  =  21  we  find  t i-a/2  =  2.08.  We  must  therefore  conclude 
that  the  experimental  data  is  not  sufficient  to  determine  an  influence  of  the 
disease  on  the  concentration.  ■ 


8.4  Concepts  of  the  General  Theory  of  Tests 

The  test  procedures  discussed  so  far  have  been  obtained  more  or  less  intuitively 
and  without  rigorous  justification.  In  particular  we  have  not  given  any  specific 
reasons  for  the  choice  of  the  critical  region.  We  now  want  to  deal  with  the 
theory  of  statistical  tests  in  a  somewhat  more  critical  way.  A  complete  treat¬ 
ment  of  this  topic  would,  however,  go  beyond  the  scope  of  this  book. 

Each  sample  of  size  N  can  be  characterized  by  N  points  in  the  sample 
space  of  Sect.  2.1.  For  simplicity  we  will  limit  ourselves  to  a  continuous  ran¬ 
dom  variable  x,  so  that  the  sample  can  be  described  by  N  points  (x(l) ,  x(2) , . . . , 
xl/V))  on  the  x  axis.  In  the  case  of  r  random  variables  we  would  have  N 
points  in  an  r-dimensional  space.  The  result  of  such  a  sample,  however,  could 
also  be  specified  by  a  single  point  in  a  space  of  dimension  rN .  A  sample 
of  size  2  with  a  single  variable  could,  for  example,  be  depicted  as  a  point  in 
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Table  8.2:  Student’s  difference  test  on  the  equality  of  means.  Data  from  Example  8.2. 


X 

y 

21 

16 

24 

20 

18 

22 

19 

19 

25 

18 

17 

19 

18 

19 

22 

21 

23 

18 

13 

16 

23 

22 

24 

Ni  =  16 

II 

£ 

x  =  20.3 

y  =  19.0 

S2  =  171.8/15 

S2  =  20/6 

a  two-dimensional  plane,  spanned  by  the  axes  x(1),x(2).  We  will  call  such  a 
space  the  E  space.  Every  hypothesis  H  consists  of  an  assumption  about  the 
probability  density 


f(x\ki,k2,...,kp)  =  f(x;\)  .  (8.4.1) 

The  hypothesis  is  said  to  be  simple  if  the  function  /  is  completely  specified, 
i.e.,  if  the  hypothesis  gives  the  values  of  all  of  the  parameters  .  It  is  said  to  be 
composite  if  the  general  mathematical  form  of  /  is  known,  but  the  exact  value 
of  at  least  one  parameter  remains  undetermined.  A  simple  hypothesis  could, 
for  example,  specify  a  standard  Gaussian  distribution.  A  Gaussian  distribution 
with  a  mean  of  zero  but  an  undetermined  variance,  however,  is  a  composite 
hypothesis.  The  hypothesis  Hq  is  called  the  null  hypothesis.  Sometimes  we 
will  write  explicitly 

Ho(k  =  ko)  =  Ho(ki  =  A. to,  A2  =  A20,  •  •  •  Wp  =  A.po)  •  (8.4.2) 

Other  possible  hypotheses  are  called  alternative  hypotheses ,  e.g., 

H\{\  =  Xi)  =  H\{k\  =  An,  ^2  =  A-21 ,  •  •  • ,  =  Xpi)  .  (8.4.3) 
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Often  one  wants  to  test  a  null  hypothesis  of  the  type  (8.4.2)  against  a  composite 
alternative  hypothesis 

Hi(k  #  Ao)  =  H\(k\  Aio,  A.2  #  A20,  •  •  • ,  kp  #  ^Po)  ■  (8.4.4) 

Since  the  null  hypothesis  makes  a  statement  about  the  probability  density  in 
the  sample  space,  it  also  predicts  the  probability  for  observing  a  point  X  = 
(x^*,x®, . . . ,  x^)  in  the  E  space.  We  now  define  a  critical  region  Sc  with 
the  significance  level  a  by  the  requirement 

P(X  e  Sc\H0)  =  a  ,  (8.4.5) 

i.e.,  we  determine  Sc  such  that  the  probability  to  observe  a  point  X  within 
Sc  is  a,  under  the  assumption  that  Ho  is  true.  If  the  point  X  from  the  sample 
actually  falls  into  the  region  Sc,  then  the  hypothesis  H)  is  rejected.  One  must 
note  that  the  requirement  (8.4.5)  does  not  necessarily  determine  the  critical 
region  Sc  uniquely. 

Although  using  the  E  space  is  conceptually  elegant,  it  is  usually  not  very 
convenient  for  carrying  out  tests.  Instead  one  constructs  a  test  statistic 

T  =  T(X)  =  T(x(1),  x(2), . . . ,  x(A°)  (8.4.6) 

and  determines  a  region  U  of  the  variable  T  such  that  it  corresponds  to  the 
critical  region  Sc,  i.e.,  one  performs  the  mapping 

X  T(X),  SC(X)  -+  U(X)  .  (8.4.7) 

The  null  hypothesis  is  rejected  if  T  e  U . 

Because  of  the  statistical  nature  of  the  sample,  it  is  clearly  possible  that 
the  null  hypothesis  could  be  true,  even  though  it  was  rejected  since  X  e  Sc. 
The  probability  for  such  an  error,  an  error  of  the  first  kind,  is  equal  to  a. 
There  is  in  addition  another  possibility  to  make  a  wrong  decision,  if  one  does 
not  reject  the  hypothesis  Hq  because  X  was  not  in  the  critical  region  Sc,  even 
though  the  hypothesis  was  actually  false  and  an  alternative  hypothesis  was 
true.  This  is  an  error  of  the  second  kind.  The  probability  for  this, 

P(X?Sc\H1)  =  /3  ,  (8.4.8) 

depends  of  course  on  the  particular  alternative  hypotheses  Hi .  This  connec¬ 
tion  provides  us  with  a  method  to  specify  the  critical  region  Sc.  A  test  is 
clearly  most  reasonable  if  for  a  given  significance  level  a  the  critical  region 
is  chosen  such  that  the  probability  f  for  an  error  of  the  second  kind  is  a 

^Although  X  and  the  function  T (x )  introduced  below  are  random  variables,  we  do  not 
use  for  them  a  special  character  type. 
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minimum.  The  critical  region  and  therefore  the  test  itself  naturally  depend 
on  the  alternative  hypothesis  under  consideration. 

Once  the  critical  region  has  been  determined,  we  can  consider  the  proba¬ 
bility  for  rejecting  the  null  hypothesis  as  a  function  of  the  “true”  hypothesis, 
or  rather  as  a  function  of  the  parameters  that  describe  it.  In  analogy  to  (8.4.5), 
this  is 

M(SC,  X)  =  P(X  e  SC\H)  =  P(X  e  SC\X)  .  (8.4.9) 

This  probability  is  a  function  of  Sc  and  of  the  parameters  X.  It  is  called  the 
power  function  of  a  test.  The  complementary  probability 


L(SC,X)  =  l-M(Sc,X)  (8.4.10) 

is  called  the  acceptance  probability  or  the  operating  characteristic  function 
of  the  test.  It  gives  the  probability  to  accept^  the  null  hypothesis.  One  clearly 
has 

M(Sc,Xi o)=(x,  Af(5c,Xi)  =  l  ft,  18  4  111 

L(SC,X  0)  =  l-a,  L(Sc,Xi)  =  P  .  {  ’ 

The  most  powerful  test  of  a  simple  hypothesis  Hq  with  respect  to  the  simple 
alternative  hypothesis  is  defined  by  the  requirement 


M(SC,  T-i)  =  1  —  ft  —  max  .  (8.4.12) 

Sometimes  there  exists  a  uniformly  most  powerful  test ,  for  which  the  require¬ 
ment  (8.4.12)  holds  for  all  possible  alternative  hypotheses. 

A  test  is  said  to  be  unbiased  if  its  power  function  is  greater  than  or  equal 
to  a  for  all  alternative  hypotheses: 

M(Sc,Xi)>a  .  (8.4.13) 

This  definition  is  reasonable,  since  the  probability  to  reject  the  null  hypothesis 
is  then  a  minimum  if  the  null  hypothesis  is  true.  An  unbiased  most  powerful 
test  is  the  most  powerful  of  all  the  unbiased  tests.  Correspondingly  one  can 
define  a  unbiased  uniformly  most  powerful  test.  In  the  next  sections  we  will 
learn  the  rules  which  sometimes  allow  one  to  construct  tests  with  such  desir¬ 
able  properties.  Before  turning  to  this  task,  we  will  first  give  an  example  to 
illustrate  the  definitions  just  introduced. 


™We  use  here  the  word  “acceptance”  of  a  hypothesis,  although  more  precisely  one  should 
say,  “There  is  no  evidence  to  reject  the  hypothesis.” 
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Example  8.3:  Test  of  the  hypothesis  that  a  normal  distribution  with  given 
variance  er2  has  the  mean  A  =  Ao 

We  wish  to  test  the  hypothesis  Ho(X  =  Ao).  As  a  test  statistic  we  use  the 
arithmetic  mean  x  =  -hXi  +  X2  + . . .  +  xM).  (We  will  see  in  Example  8.4  that 
this  is  the  most  appropriate  test  statistic  for  our  purposes.)  From  Sect.  6.2  we 
know  that  x  is  normally  distributed  with  mean  A  and  variance  er2/n,  i.e.,  that 
the  probability  density  for  x  for  the  case  A  =  Ao  is  given  by 


f(x;k  0) 


(8.4.14) 


This  is  shown  in  Fig.  8.4  together  with  four  different  critical  regions,  all  of 
which  have  the  same  significance  level  a. 

These  are  the  regions 

U\  :  x  <  A1  and  x  >  A11  with  f(x)dx  —  f(x)dx  —  ^ a  , 

C/2  :  x  >  Am  with  Am  f(x) dx  —  a  , 

\  IV 

C/3  :  x  <  AIV  with  j_OQ  f(x)  dx  —  a  , 

.  .  yj 

C/4  :  Av  <  x  <  AVI  with  fxy  f(x)dx  =  JXq  f(x)dx  =  j a  . 

In  order  to  obtain  the  power  functions  for  each  of  these  regions,  we  must  vary 
the  mean  value  A.  The  probability  density  of  x  for  an  arbitrary  value  of  A  is, 
in  analogy  to  (8.4.14),  given  by 


f(x;  A)  = 


V2 


ncr 


exp[-^(i-A)2] 


(8.4.15) 


The  dashed  curve  in  Fig.  8.4b  represents  the  probability  density  for  A  =  Ai  = 
Ao  +  1.  The  power  function  (8.4.9)  is  now  simply 


P(xe  U |A)  =  /  f(x- A)dx 


L 


(8.4.16) 


The  power  functions  obtained  in  this  way  for  the  critical  regions  U 1,  U2,  U 3, 
C/4  are  shown  in  Fig.  8.4c  for  n  =  2  (solid  curve)  and  n  =  10  (dashed  curve). 

We  can  now  compare  the  effects  of  the  four  tests  corresponding  to  the 
various  critical  regions.  From  Fig.  8.4c  we  immediately  see  that  U\  corre¬ 
sponds  to  an  unbiased  test,  since  the  requirement  (8.4.13)  is  clearly  fulfilled. 
On  the  other  hand,  the  test  with  the  critical  region  C/2  is  more  powerful  for 
the  alternative  hypothesis  H\{X\  >  Ao),  but  is  not  good  for  H\(X\  <  Ao).  For 
the  test  with  C/3,  the  situation  is  exactly  the  opposite.  Finally,  the  region  C/4 
provides  a  test  for  which  the  rejection  probability  is  a  maximum  if  the  null  hy¬ 
pothesis  is  true.  Clearly  this  is  very  undesirable.  The  test  was  only  constructed 
for  demonstration  purposes.  If  we  compare  the  first  three  tests,  we  see  that 
none  of  them  are  more  powerful  than  the  other  two  for  all  values  of  A 1 .  Thus 
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Fig.  8.4:  (a)  Critical  regions  in  E  space,  (b)  critical  region  of  the  test  function,  and  (c)  power 
function  of  the  test  from  Example  8.3. 


8.5  The  Neyman-Pearson  Lemma  and  Applications 
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we  have  not  succeeded  in  finding  a  uniformly  most  powerful  test.  In  Exam¬ 
ple  8.4,  where  we  will  continue  the  discussion  of  the  present  example,  we  will 
determine  that  for  this  problem  there  does  not  exist  a  uniformly  most  powerful 
test.  ■ 


8.5  The  Neyman-Pearson  Lemma  and  Applications 

In  the  last  section  we  introduced  the  E  space,  in  which  a  sample  is  represented 
by  a  single  point  X.  The  probability  to  observe  a  point  X  within  the  critical 
region  Sc  -  providing  the  null  hypothesis  Hq  is  true  -  was  defined  in  (8.4.5), 

P(X  €  Sc\H0)=a  .  (8.5.1) 

We  now  define  a  conditional  probability  in  E  space, 

f(X\H0). 


One  clearly  has 


/  f(X\H0)dX  =  P(XeSc\H0)=a  .  (8.5.2) 

Jsc 

The  Neyman-Pearson  lemma  states  the  following: 

A  test  of  the  simple  hypothesis  Hq  with  respect  to  the  simple 
alternative  hypothesis  H\  is  a  most  powerful  test  if  the  critical 
region  Sc  in  E  space  is  chosen  such  that 


/ (X|//o)  [  <  c  for  all  X  e  Sc 
f(X\Hi)  i  >  c  for  all  X  g  Sc 


(8.5.3) 


Here  c  is  a  constant  which  depends  on  the  significance  level. 


We  will  prove  this  by  considering  another  region  S  along  with  Sc.  It  may 
partially  overlap  with  Sc,  as  sketched  in  Fig.  8.5.  We  choose  the  size  of  the 
region  S  such  that  it  corresponds  to  the  same  significance  level,  i.e., 

f  f(X\H0)dX  =  f  f(X\H0)dX=a  . 

Js  Jsc 
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Using  the  notation  of  Fig.  8.5,  we  can  write 


L 


f(X\H0)dX  =  /  f(X\H0)dX-  /  f(X\H0)dX 


L 

L 


=  f(X\H0)dX-  /  f(X\H0)dX 


f(X\H0)dX 


Since  A  is  contained  in  Sc,  we  can  use  (8.5.3),  i.e., 


L 


L 

l 


L 


f(X\H0)dX<c  /  f(X\H{)dX 


Correspondingly,  since  B  is  outside  of  Sc,  one  has 


L 


L 


f(X\H0)dX>c  /  f(X\Hi)dX 


We  can  now  express  the  power  function  (8.4.9)  using  these  integrals: 


L 


L 


M(SC,\ i)  =  /  f(X\Hl)dX=  f(X\Hi)dX+  f(X\Hi)dX 


>  -  f  f(X\H0)dX  + 
c  Ja 


L 


L 


L 


f(X\Hx)dX 


> 


> 


L 

L 


l 


f(X\Hl)dX+  /  f(X\Hi)dX 


f(X\Hl)dX  =  M(S,Xi) 


or  directly, 


M(Sc,\i)>M(S,Xi) 


(8.5.4) 


This  is  exactly  the  condition  (8.4.12)  for  a  uniformly  most  powerful  test. 
Since  we  have  not  made  any  assumptions  about  the  alternative  hypothesis 
H[(X  =  A.i)  or  the  region  S,  we  have  proven  that  the  requirement  (8.5.3) 
provides  a  uniformly  most  powerful  test  when  it  is  fulfilled  by  the  alterna¬ 
tive  hypothesis. 


8.5  The  Neyman-Pearson  Lemma  and  Applications 
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Example  8.4:  Most  powerful  test  for  the  problem  of  Example  8.3 

We  now  continue  with  the  ideas  from  Example  8.3,  i.e.,  we  consider  tests 
with  a  sample  of  size  N,  obtained  from  a  normal  distribution  with  known 
variance  o2  and  unknown  mean  A.  The  conditional  probability  density  of  a 
point  X  =  (x(1),  x(2), . . . ,  x(7V))  in  E  space  is  the  joint  probability  density  of 
the  x^  for  given  values  of  A,  i.e., 


f(X\H0)=(^L-)  exp 

V  \l2no  / 


N 


^(x^-Ao)2 

7  =  1 


for  the  null  hypothesis  and 


(8.5.5) 


f(X\Hl)  = 


1 


N 


V2 


exp 


na 


7  =  1 


(8.5.6) 


for  the  alternative  hypothesis.  The  ratio  (8.5.3)  takes  on  the  form 


^  f(X\H0) 

Q  — -  =  exp 

f(X\Hi) 


=  exp 


_i_  I  ^(X(7)  _  Xo)2  _  E(X0')  _  X\Y 
l 7=1  7=1 

1  f  N 

^2  |  N^0  ~  Al>  -  2^0  -  Al)  J]X(2) 


The  expression 


exp 


N  ?  ? 

2^<X»-A>) 


=  k>  0 


is  a  non-negative  constant.  The  condition  (8.5.3)  thus  has  the  form 


k  exp 


Aq  —  Ai 


<  C,  X  £  Sc 

>  C,  X$SC 


This  is  the  same  as 


(Ao-Ai)x 


<  c' ,  X  £  Sc  , 
>  c',  X<?SC  . 


(8.5.7) 


Here  c'  is  a  constant  different  from  c.  Equation  (8.5.7)  is,  however,  not  only 
a  condition  for  Sc,  but  also  specifies  directly  that  x  should  be  used  as  the 
test  variable.  For  each  given  Ai,  i.e.,  for  every  simple  alternative  hypothesis 
7/i(A  =  A i ),  (8.5.7)  gives  a  clear  prescription  for  the  choice  of  Sc  or  U,  i.e., 
for  the  critical  region  and  the  test  variable  x. 
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For  the  case  Ai  <  Ao,  the  relation  (8.5.7)  becomes 

-  |  <c",  XeSc  , 

|  >  c",  X  £  Sc  . 

This  corresponds  to  the  situation  in  Fig.  8.4b3  with  c"  =  A/v\  Similarly,  for 
every  alternative  hypothesis  with  Ai  >  Ao,  the  critical  region  of  the  most 
powerful  test  is  given  by 

x  >  c" 

(see  Fig.  8.4b2  with  c'"  =  A"').  There  does  not  exist  a  uniformly  most  powerful 
test,  since  the  factor  (Ao  —  Ai)  in  Eq.  (8.5.7)  changes  sign  at  Ai  =  Aq.  ■ 


8.6  The  Likelihood-Ratio  Method 


The  Neyman-Pearson  lemma  gave  the  condition  for  a  uniformly  most  pow¬ 
erful  test.  Such  a  test  did  not  exist  if  the  alternative  hypothesis  included 
parameter  values  that  could  be  both  greater  and  less  than  that  of  the  null 
hypothesis.  We  determined  this  in  Example  8.4;  it  can  be  shown,  however, 
that  it  is  true  in  general.  The  question  thus  arises  as  to  what  test  should  be 
used  when  no  uniformly  most  powerful  test  exists.  Clearly  this  question  is 
not  formulated  precisely  enough  to  allow  a  unique  answer.  We  would  like 
in  the  following  to  give  a  prescription  that  allows  us  to  construct  tests  that 
have  desirable  properties  and  that  have  the  advantage  of  being  relatively  easy 
to  use. 

We  consider  from  the  beginning  the  general  case  with  p  parameters  A  = 
(Ai,  A2, . . . ,  A  p).  The  result  of  a  sample,  i.e.,  the  point  X  =  (x® ,  x® , . . . ,  x®)) 
in  E  space  is  to  be  used  to  test  a  given  hypothesis.  The  (composite)  null 
hypothesis  is  characterized  by  a  given  region  for  each  parameter.  We  can  use 
a  p-dimensional  space,  with  the  Ai,  A2, . . . ,  Ap  as  coordinate  axes,  and  con¬ 
sider  the  region  allowed  by  the  null  hypothesis  as  a  region  in  this  parameter 
space,  called  co.  We  denote  the  region  in  this  space  representing  all  possible 
values  of  the  parameters  by  Q .  The  most  general  alternative  hypothesis  is 
then  the  part  of  Q  that  does  not  contain  co.  We  denote  this  by  Q  —  co.  Recall 
now  from  Chap.  7  the  maximum-likelihood  estimator  A  for  a  parameter  A.  It 
is  that  value  of  A  for  which  the  likelihood  function  is  a  maximum.  In  Chap.  7 
we  tacitly  assumed  that  one  searched  for  the  maximum  in  the  entire  allowable 
parameter  space.  In  the  following  we  will  consider  maxima  in  a  restricted 
region  (e.g.,  in  co).  We  write  in  this  case  X<0J> .  The  likelihood-ratio  test  defines 
a  test  statistic 


f(xW  x®  X®)-A(i2)  A(i2)  A(i2)l 

,/  VA  5  A  >  *  *  •  5  A  5  A]  5  A2  5  •  •  •  5  ) 


/(X®,  X® , . . . ,  X®);  A^,  A^\  . . . ,  A {p]) 


(8.6.1) 
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Here  /(x(1\  x(2), . . . ,  x(A^;  Ai,  X2, . . . ,  Xp)  is  the  joint  probability  density  of 
the  x(j,)  (/  =  1,2,...,  iV),  i.e.,  the  likelihood  function  (7.1.5).  The  procedure 
of  the  likelihood  ratio  test  prescribes  that  we  reject  the  null  hypothesis  if 

T  >  T\-a  .  (8.6.2) 

Here  T\-a  is  defined  by 

nOQ 

P(T  >  T\-a\Ho)  =  /  g(T\H0)dT  ,  (8.6.3) 

’  Ti-a 

and  g(T\Ho)  is  the  conditional  probability  density  for  the  test  statistic  T.  The 
following  theorem  by  Wilks  [9]  concerns  the  distribution  function  of  T  (or 
actually  —  21nT)  in  the  limiting  case  of  very  large  samples: 

If  a  population  is  described  by  the  probability  density  f(x\X\, 

7.2 , ,Xp)  that  satisfies  reasonable  requirements  of  continuity, 
and  if  p  —  r  of  the  p  parameters  are  fixed  by  the  null  hypothesis, 
while  r  parameters  remain  free,  then  the  statistic  — 21nT  follows 
a  x  2-distribution  with  p  —  r  degrees  of  freedom  for  very  large 
samples,  i.e.,  for  N  — >  oo. 

We  now  apply  this  method  to  the  problem  of  Examples  8.3  and  8.4,  i.e., 
we  consider  tests  with  samples  from  a  normally  distributed  population  with 
known  variance  and  unknown  mean. 


Example  8.5:  Power  function  for  the  test  from  Example  8.3 

For  the  simple  hypothesis  7/q(a  =  Aq),  the  region  co  of  the  parameter  space  is 
reduced  to  the  point  X  =  Ao .  We  have  thus 

A(w)  =  A0  .  (8.6.4) 


If  we  consider  the  most  general  alternative  hypothesis  H\  (A  =  Ai  /  Ao),  then 
we  obtain  as  the  maximum-likelihood  estimator  of  A  the  sample  mean  x. 
The  likelihood  ratio  (8.6.1)  thus  becomes 


/(x(1),x(2),...,x(;v);x) 

/(X(1),X® . xW;  A0) 


The  joint  probability  density  is  given  by  (7.2.6), 


(8.6.5) 


1 


2a2 


N 


I^-A)2 

7  =  1 


(8.6.6) 


Therefore, 
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T 


exp 


1 


2cr 2 


N 


N 


y^(x(j,)  -  x)2 + y^(x(/)  -  a.0)2 

7=1  7=1 


exp 


1  N 

-2^(X-A  q)2 


exp 


2cr 2 

/V 

2cr2 


7=1 


(x-Ao) 


We  must  now  calculate  7j_a  and  reject  the  hypothesis  H$  if  the  inequal¬ 
ity  (8.6.2)  is  fulfilled.  Since  the  logarithm  of  T  is  a  monotonic  function  of 
T,  we  can  use 

r/  =  21n7’  =  — r(x-A0)2  (8.6.7) 

oL 

as  the  test  statistic  and  reject  Hq  if 


T'  >  T[_a 


with 


h(T'\Ho)dT'  =  a 


In  order  to  calculate  the  probability  density  h(T'\Ho)  of  T',  we  start  with  the 
density  fix)  for  the  sample  mean  with  the  condition  X  =  Aq, 


/<'i|"o)  =  V/5^exp(_^ti_Ao)2)  ' 

In  order  to  carry  out  the  transformation  of  variables  (3.7.1),  we  need  in  addi¬ 
tion  the  derivative, 


dx 
d T 


1  /Zr— 1/2 

2  y  N  1 


which  can  be  easily  obtained  from  (8.6.7).  One  then  has 


h(T'\Ho)  = 


dx 

dT 


\/7jt 


(8.6.8) 


This  is  indeed  a  x  2-distribution  for  one  degree  of  freedom.  Thus  in  our 
example,  Wilks’  theorem  holds  even  for  finite  N.  We  see,  therefore,  that 
the  likelihood-ratio  test  yields  the  unbiased  test  of  Fig.  8.4b  1.  The  test 


N 


T'  =  —  (X  -  Ao)2  >  T[_a 
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is  equivalent  to 


1/2 

|jc  —  Aq|  <  Ar, 


N  \  1/2  _  „ 

— 2  )  \x  —  A0|  >  A 


(8.6.9) 


with 

-A'  =  X"  =  (T[_a)1'2  =  (xla)1/2  =  Xl-a  ■ 

We  can  use  this  result  to  compute  explicitly  the  power  function  of  our  test.  For 
a  given  value  of  the  population  mean  X,  the  probability  density  for  the  sample 
mean  is 


f(x;X)  = 


N  \1/2 


2na2 


I  exp 


N(x-X) 

2a2 


2l 


=  00 


x  —  X 


a 


/Vn. 


Using  (8.4.9)  and  (8.6.9)  we  obtain 


M{SC-X )  = 


/A  poo 

/(v;A)dx-|-  /  f(x;X)dx 

-oo  J  B 


(8.6.10) 


A  = 


X  —  ^o\  (  A,  —  Ao  \ 

~x\-a°  I  An  —  Xq,  b  =  xi-acr  /  An  -  x0  . 


Here  (po  and  0o  are  the  probability  density  and  distribution  function  of  the 
standard  normal  distribution.  The  power  function  (8.6.10)  is  shown  in  Fig.  8.6 
for  a  =  0.05  and  various  values  of  N /a2,  m 


Example  8.6:  Test  of  the  hypothesis  that  a  normal  distribution  of  unknown 
variance  has  the  mean  value  X  =  Ao 

In  this  case  the  null  hypothesis  Ho(X  =  Ao)  is  composite,  i.e.,  it  makes  no 
statement  about  a2.  From  Example  7.8  we  know  the  maximum-likelihood 
estimator  in  the  full  parameter  space, 


In  the  parameter  space  of  the  null  hypothesis  we  have 


Afw)  =  A0, 


1 

N 


J](xo)-A0)2 


198 


8  Testing  Statistical  Hypotheses 


N/o 


2 


Fig.  8.6  :  Power  function  of  the  test  from  Example  8.5.  The  right-most  curve  corresponds  to 
N/a2  =  1. 


The  likelihood  ratio  (8.6.1)  is  then 

/E(x<-')-Xo)2\'’/2  /  7vE(x0l-><)2  wE(x(-')-x„)2\ 

(  £(x«  -  x)2  )  exp  (  2  £(xU>  -  X)2  2  £(x«  -  Ao)2) 

/£(x«>-  Ao)2\W/2 

(  £(x 0)  -  x)2  ) 


We  transform  again  to  a  different  test  statistic  T'  that  is  a  monotonic  function 
of  T, 


T' 

T' 


j2/N  _  E(X^  ^-o)2 

"  ECx^-x)2 


N  -  1 


E(x0)  -  X)2  +  V(x  -  A0)2 

E(x(2)-x)2 


,(8.6.11) 


where 


t  =  VN 


x-k0 


(Y.  (xQ^-X)2 

y  N— 1 


x  -  AO 

Sx 


1/2 


(8.6.12) 
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is  Student’s  test  variable  introduced  in  Sect.  8.3.  From  (8.6.11)  we  can  com¬ 
pute  a  value  of  t  for  a  given  sample  and  reject  the  null  hypothesis  if 


The  very  generally  formulated  method  of  the  likelihood  ratio  has  led  us  to 
Student’s  test,  which  was  originally  constructed  for  tests  with  samples  from  a 
normal  distribution  with  known  mean  and  unknown  variance.  ■ 

8.7  The  x2-Test  for  Goodness-of-Fit 

8.7.1  x2-Test  with  Maximal  Number  of  Degrees  of  Freedom 

Suppose  one  has  N  measured  values  gi,  i  =  1,2 , ,N ,  each  with  a  known 
measurement  error  a, .  The  meaning  of  the  measurement  error  is  the  follow¬ 
ing:  gi  is  a  measurement  of  the  (unknown)  true  quantity  hi .  One  has 

gi  -  hi  +  Si ,  i  —  1,2, ,  N  .  (8.7.1) 

Here  the  deviation  e,-  is  a  random  variable  that  follows  a  normal  distribution 
with  mean  0  and  standard  deviation  <jj . 

We  now  want  to  test  the  hypothesis  specifying  the  values  hi  on  which  the 
measurement  is  based, 


hi  =  ft,  i  =  1, 2, . . . ,  iV  .  (8.7.2) 

If  this  hypothesis  is  true,  then  all  of  the  quantities 

Ui  =  -  -  - ,  i  =  l,2,...,  AT  ,  (8.7.3) 

07- 

follow  the  standard  Gaussian  distribution.  Therefore, 

7'=E“?=E(£^)  <8J'4) 

i= 1  i= 1  v  1  7 

follows  a  x2-distribution  for  N  degrees  of  freedom.  If  the  hypothesis  (8.7.2)  is 
false,  then  the  individual  deviations  of  the  measured  values  gi  from  the  values 
predicted  by  the  hypothesis  normalized  by  the  errors  cr(-,  (8.7.3),  will  be 
greater.  For  a  given  significance  level  a,  the  hypothesis  (8.7.2)  is  rejected  if 

T  >  Xi-a  ,  (8.7.5) 

i.e.,  if  the  quantity  (8.7.4)  is  greater  than  the  quantile  X\-a  of  the  x2- 
distribution  for  N  degrees  of  freedom. 
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8.7.2  x2_Test  with  Reduced  Number  of  Degrees  of  Freedom 

The  number  of  degrees  of  freedom  is  reduced  when  the  hypothesis  to  be  tested 
is  less  explicit  than  (8.7.2).  For  this  case  we  consider  the  following  exam¬ 
ple.  Suppose  a  quantity  g  can  be  measured  as  a  function  of  an  independent 
controlled  variable  t,  which  itself  can  be  set  without  error, 

g  =  g(0  ■ 

The  individual  measurements  gi  correspond  to  given  fixed  values  of  the 
independent  variable.  The  corresponding  true  quantities  hi  are  given  by  some 
function 

hi  =  h(ti)  . 

A  particularly  simple  hypothesis  for  this  function  is  the  linear  equation 

f(t)  =  h(t)=at  +  b  .  (8.7.6) 

The  hypothesis  can  in  fact  include  specifying  the  numerical  values  for 
the  parameters  a  and  b.  In  this  case,  all  values  /}  in  (8.7.2)  are  exactly  known, 
and  the  quantity  (8.7.4)  follows  -  if  the  hypothesis  is  true  -  a  x 2-distribution 
for  N  degrees  of  freedom. 

The  hypothesis  may,  however,  only  state:  There  exists  a  linear  relation¬ 
ship  (8.7.6)  between  the  controlled  variable  t  and  the  variable  h.  The  numer¬ 
ical  values  of  the  parameters  a  and  b  are,  however,  unknown.  In  this  case 
one  constructs  estimators  a,  b  for  the  parameters,  which  are  functions  of  the 
measurements  gi  and  the  errors  cr,-.  The  hypothesis  (8.7.2)  is  then 

hi  —h(ti)  -  fi  —ati+b  . 

Since,  however,  a  and  b  are  functions  of  the  measurements  gi ,  the  normalized 
deviations  Ui  in  (8.7.3)  are  no  longer  all  independent.  Therefore  the  number 
of  degrees  of  freedom  of  the  x 2-distribution  for  the  sum  of  squares  (8.7.4) 
is  reduced  by  2  to  ./V  —  2,  since  the  determination  of  the  two  quantities  a,  b 
introduces  two  equations  of  constraint  between  the  quantities  w,-. 


8.7.3  x  2 'Test  and  Empirical  Frequency  Distribution 

Suppose  we  have  a  distribution  function  F(x )  and  its  probability  density 
fix).  The  full  region  of  the  random  variable  x  can  be  divided  into  r  inter¬ 
vals 

£i>£2.  , 

as  shown  in  Fig.  8.7.  By  integrating  f(x)  over  the  individual  intervals  we 
obtain  the  probability  to  observe  x  in  , 
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Pi  =  P(x  €§,•)  =  J  f(x)dx;  ^JPi  =  1  •  (8.7.7) 

^ 1  i= 1 

We  now  take  a  sample  of  size  n  and  denote  by  n,-  the  number  of  elements  of  the 
sample  that  fall  into  the  interval  §,• .  An  appropriate  graphical  representation 
of  the  sample  is  a  histogram,  as  described  in  Sect.  6.3. 


Fig.  8.7 :  Dividing  the  range  of  the 
variable  x  into  the  intervals  . 


One  clearly  has 

r 

rii—n  .  (8.7.8) 

i= 1 

From  the  (hypothetical)  probability  density  for  the  population  we  would  have 
expected  the  value 

npi 


for  rii.  For  large  values  of  n,-,  the  variance  of  n,  is  equal  to  /?,-  (Sect.  6.8),  and 
the  distribution  of  the  quantity  «,•  with 


(ni  —  n  pi)2 

Hi 


(8.7.9) 


becomes  approximately  -  if  the  hypothesis  is  true  -  a  standard  Gaussian 
distribution.  This  holds  also  if  one  uses  the  expected  variances  n  pi  instead 
of  the  observed  quantities  n;  in  the  denominator  of  (8.7.9), 

(tii -n  pi)2 


uf  = 


npi 


(8.7.10) 


If  we  now  construct  the  sum  of  squares  of  the  «/  for  all  intervals, 


=  ;  (8.7.11) 

i—\ 

then  we  expect  (for  large  n)  that  this  follows  a  x  2-distribution  if  the  hypoth¬ 
esis  is  true.  The  number  of  degrees  of  freedom  is  r  —  1,  since  the  «/  are  not 
independent  because  of  (8.7.8).  The  number  of  degrees  of  freedom  is  reduced 
to  r  —  1  —  p  if,  in  addition,  p  parameters  are  determined  from  the  observations. 
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Example  8.7:  x2*test  for  the  fit  of  a  Poisson  distribution  to  an  empirical 
frequency  distribution 

In  an  experiment  investigating  photon-proton  interactions,  a  beam  of  high 
energy  photons  (y -quanta)  impinge  on  a  hydrogen  bubble  chamber.  The 
processes  by  which  a  photon  materializes  in  the  chamber,  electron-positron 
pair  conversion,  are  counted  in  order  to  obtain  a  measure  of  the  intensity 
of  the  photon  beam.  The  frequency  of  cases  in  which  0,1,2,. . .  pairs  are  ob¬ 
served  simultaneously,  i.e.,  in  the  same  bubble-chamber  photograph,  follows  a 
Poisson  distribution  (see  Example  5.3).  Deviations  from  the  Poisson  distribu¬ 
tion  provide  information  about  measurement  losses,  which  are  important  for 
uncovering  systematic  errors.  The  results  of  observing  n  =  355  photographs 
are  given  in  column  2  of  Table  8.3  and  in  Fig.  8.8.  From  Example  7.4,  we 
know  that  the  maximum-likelihood  estimator  of  the  parameter  of  the  Poisson 
distribution  is  given  by  X  =  J2k^nk/^2knk-  We  find  A  =  2.33.  The  values  pk 
of  the  Poisson  distribution  with  this  parameter  multiplied  by  n  are  given  in 
column  3.  By  summing  the  squared  terms  in  column  4  one  obtains  the  value 
X2  =  10.44.  The  problem  has  six  degrees  of  freedom,  since  r  =  8,  p  =  1.  We 
chose  a  =  1%  and  find  Xq  99  =  16.81  from  Table  1.7.  We  therefore  have  no 
reason  to  reject  the  hypothesis  of  a  Poisson  distribution.  ■ 


Table 8.3:  Data  for  the  x2-test  from  Example  8.7. 


Number  of 
electron  pairs 

per  photograph 

k 

Number  of 
photographs  with 

k  electron  pairs 

nk 

Prediction 
of  Poisson 
distribution 

n  Pk 

( nk  ~  n  Pk)2 

n  Pk 

0 

47 

34.4 

4.61 

1 

69 

80.2 

1.56 

2 

84 

93.7 

1.00 

3 

76 

72.8 

0.14 

4 

49 

42.6 

0.96 

5 

16 

19.9 

0.76 

6 

11 

7.8 

1.31 

7 

3 

2.5 

0.10 

8 

— 

(0.7) 

X2  =  10.44 


n  =  '^2nk  =  355 
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nk.  npk 


0 


60 


40  : 


20 


—  1 


— 1 — 1 — ^  — >-►  k 

4  6  8 


Fig.  8.8:  Comparison  of  the  experimental  distri¬ 
bution  nk  (histogram  with  solid  line )  from  Exam¬ 
ple  8.7  with  the  Poisson  distribution  npk  (< dashed 
line). 


8.8  Contingency  Tables 


Suppose  n  experiments  have  been  carried  out  whose  results  are  characterized 
by  the  values  of  two  random  variables  x  and  y.  We  consider  the  two  variables 
as  discrete,  being  able  to  take  on  the  values  x\,  X2,  . ..,  xu  yi,  yi,  . ..,  yi . 
Continuous  variables  can  be  approximated  by  discrete  ones  by  dividing  their 
range  into  intervals,  as  shown  in  Fig.  8.7.  Let  the  number  of  times  the  result 
x  =  Xi  and  y  =  y;  is  observed  be  n;/- .  One  can  arrange  the  numbers  n  /  ;  in  a 
matrix,  called  a  contingency  table  (Table  8.4). 


Table  8.4:  Contingency  table. 


Vl 

T2 

•  ••  yi 

Xl 

nn 

n  12 

...  nu 

*2 

• 

ni\ 

• 

1122 

• 

...  n2t 

• 

• 

xk 

• 

nici 

• 

nk2 

• 

•  •  •  nu 

We  denote  by  pi  the  probability  for  x  =  x(-  to  occur,  and  by  qj  the  prob¬ 
ability  for  y  =  yj.  If  the  variables  are  independent,  then  the  probability  to 
simultaneously  observe  x  =  xi  and  y  =  yj  is  equal  to  the  product  /y  qj .  The 
maximum-likelihood  estimators  for  p  and  q  are 
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Since 


i  k  1 


i=\ 


i  k 


7=1  «=1 


5 


one  has  k  + 1  —  2  independent  estimators  pL ,  qj .  We  can  now  organize  the 
elements  of  the  contingency  table  into  a  single  line, 


nn,n\2, . . .  ,nu,n2i,n22, . . .  ,n2(., . . .  ,nki  , 

and  carry  out  a  x2-test.  For  this  we  must  compute  the  quantity 


l 


i=i  j= l 


(njj-npiqj) 

n  pi  q  j 


(8.8.1) 


and  compare  it  to  the  quantile  X\-a  of  the  x  2-distribution  corresponding  to  a 
given  significance  level  a.  The  number  of  degrees  of  freedom  is  still  obtained 
from  the  number  of  intervals  minus  the  number  of  estimated  parameters  mi¬ 
nus  one, 

f  =  kl-\-(k  +  l-2)  =  (k-\)(t-\)  . 

If  the  variables  are  not  independent,  then  rijj  will  not,  in  general,  be  near 
n  pi  qj,  i.e.,  one  will  find 

X2  >  Xi-ct  (8-8.2) 

and  the  hypothesis  will  be  rejected. 


8.9  2x2  Table  Test 

The  simplest  nontrivial  contingency  table  has  only  two  rows  and  two  columns, 
and  is  called  a  2  x  2  table,  as  shown  in  Table  8.5.  It  is  often  used  in  medical 
studies.  (The  variables  xi  and  X2  could  represent,  for  example,  two  differ¬ 
ent  treatment  methods,  and  yi  and  yi  could  represent  success  and  failure  of 
the  treatment.  One  wishes  to  determine  whether  success  is  independent  of 
the  treatment.) 


Table 8.5:  2x2  table. 


>’l  >>2 

Xi 

*2 

nn  =  ci  n\2  —  b 
n2i  =c  ti22=d 
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One  computes  the  quantity  X 2  either  according  to  (8.8.1)  or  from  the 
formula 


n(ad  —  be)2 

{cl  T  b)  (c  T-  d)  {cl  +  c)  {b  d) 


which  is  obtained  by  rearranging  (8.8.1).  If  the  variables  x  and  y  are  inde¬ 
pendent,  then  X 2  follows  a  x  ^distribution  with  one  degree  of  freedom.  One 
rejects  the  hypothesis  of  independence  at  the  significance  level  a  if 


In  order  for  the  quantity  X2  to  actually  follow  a  x  ^distribution  is  again 
necessary  that  the  individual  riij  are  sufficiently  large,  (and  the  hypothesis  of 
independence  must  be  true). 


8.10  Example  Programs 

Example  Program  8.1:  The  class  ElTest  generates  samples  and  tests 
the  equality  of  their  variances 

The  program  performs  a  total  of  nexp  simulated  experiments.  Each  experiment  consists 
of  the  simulation  of  two  samples  of  sizes  N\  and  N2  from  normal  distributions  with 
standard  deviations  o\  and  <72.  The  variance  of  each  of  the  samples  is  computed  using 
the  class  Sample.  The  sample  variances  are  called  S  iuid  S  ,^so  that 


From  the  corresponding  sample  sizes  the  numbers  of  degrees  of  freedom  fg  =  Ng  —  1 
and  fk  =  Nk  —  1  are  computed.  Finally,  the  ratio  /s\  is  compared  with  the  quantile 
Ei _a/2 (fg,  fk )  at  a  given  confidence  level  ft  =  1  —  a.  If  the  ratio  is  larger  than  the 
quantile,  then  the  hypothesis  of  equal  variances  has  to  be  rejected.  The  program  asks 
for  the  quantities  nQxp,  Afi,  N2,  07,  cr2,  and  /3.  For  each  simulated  experiment  one  line 
of  output  is  displayed. 

Suggestions:  Choose  nQxp  =  20  and  ft  =  0.9.  (a)  For  o\  —  cr2  you  would  expect 
the  hypothesis  to  be  rejected  in  2  out  of  20  cases  because  of  an  error  of  the  first  kind. 
Note  the  large  statistical  fluctuations,  which  obviously  depend  on  Afi  and  N2,  and 
choose  different  pairs  of  values  Afi,  N2  for  o\  —  a2.  (b)  Check  the  power  of  the  test 
for  different  variances  o\  ^o2. 

Example  Program  8.2:  The  class  E2Test  generates  samples  and  tests  the 
equality  of  their  means  with  a  given  value  using  Student’s  Test 

This  short  program  performs  nexp  simulation  experiments.  In  each  experiment  a  sam¬ 
ple  of  size  N  is  drawn  from  a  normal  distribution  with  mean  xo  and  width  a .  Using 
the  class  Sample  the  sample  mean  x  and  the  sample  variance  S  ^re  determined. 

If  A0  is  the  population  mean  specified  by  the  hypothesis,  then  the  quantity 
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x-X0\Vn 

Sx 


can  be  used  to  test  the  hypothesis.  At  a  given  confidence  level  =  1  —  a  the  hypoth¬ 
esis  is  rejected  if 


t\  >  h-a/2 


Here  0-a/2  is  the  quantile  of  Student’s  distribution  with  /  =  N  —  1  degrees  of 
freedom.  The  program  asks  for  the  quantities  nexp,  N ,  xo,  cr ,  Ao,  and  /3.  For  each 
simulated  experiment  one  line  of  output  is  displayed. 

Suggestion:  Modify  the  suggestions  at  the  end  of  Sect.  8.1  to  apply  to  Stu¬ 
dent’s  test. 


Example  Program  8.3:  The  class  E3Test  generates  samples 

and  computes  the  test  statistic  /  2  for  the  hypothesis  that  the  samples 
are  taken  from  a  normal  distribution  with  known  parameters 

For  samples  of  size  N  the  hypothesis  Ho  that  they  stem  from  a  normal  distribution 
with  mean  ao  and  standard  deviation  ctq  is  tested.  A  total  of  nexp  samples  are  drawn 
in  simulated  experiments  from  a  normally  distributed  population  with  mean  a  and 
standard  deviation  a .  For  each  sample  the  quantity 

N 

*  =  £ 

1  =  1 

is  computed.  Here  X,  are  the  elements  of  the  sample.  If  a  —  ao  and  a  —  oq,  then  the 
quantity  X2  follows  a  x 2 -distribution  for  N  degrees  of  freedom.  This  quantity  can 
therefore  be  used  to  perform  a  x  2 -test  on  the  hypothesis  Hq.  The  program  does  not, 
however,  perform  the  x  2 -test,  but  rather  it  displays  a  histogram  of  the  quantity  X2  to¬ 
gether  with  the  x  2 -distribution.  One  observes  that  for  a  =  ao  and  cr  =  a0  the  histogram 
and  x  2 -distribution  indeed  coincide  within  statistical  fluctuations.  If,  however,  a^ao 
and/or  cr  /  cr0,  then  deviations  appear.  These  deviations  become  particularly  clear  if 
instead  of  X2  the  quantity 

P(X2)  =  1  -  F(X2;  N)  (8.10.2) 

is  displayed.  Here  F(X2,  N )  is  the  distribution  function  (C.5.2)  of  the  x 2 -distribution 
for  N  degrees  of  freedom.  F(X2,  N )  is  equal  to  the  probability  that  a  random  variable 
drawn  from  a  x 2 -distribution  is  smaller  than  X2.  Thus,  P  is  the  probability  that  it  is 
greater  than  or  equal  to  X2.  If  the  hypothesis  Hq  is  true,  then  F  and  therefore  also 
P  follow  uniform  distributions  between  0  and  1.  If,  however,  Hq  is  false,  then  the 
distribution  of  the  X2  is  not  a  x 2 -distribution,  and  the  distribution  of  the  P  is  not  a 
uniform  distribution.  The  test  statistic  X2  often  (not  completely  correctly)  is  simply 
called  “x2”  and  the  quantity  P  is  then  called  the  “x2-Pr°bability”.  Large  values  of  X2 
obviously  signify  that  the  terms  in  the  sum  (8. 10. 1)  are  on  the  average  large  compared 
to  unity,  i.e.,  that  the  X/  are  significantly  different  from  ao.  For  large  values  of  X2, 
however,  P  becomes  small,  cf.  (8.10.2).  Large  values  of  “x2”  therefore  correspond  to 
small  values  of  the  “x 2^ -probability”.  The  hypothesis  Ho  is  rejected  at  the  confidence 
level  /3  =  1  —  a  if  X2  >  x2_a(^)-  That  is  equivalent  to  F(X2,  N)  >  /3  or  P  <  a. 


x,-  -  ao 


a n 


(8.10.1) 
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The  program  allows  for  interactive  input  of  the  quantities  nQX p,  N,  a ,  ao,  or,  or0 
and  displays  the  distributions  of  both  X2  and  P(X2)  in  the  form  of  histograms. 

Suggestions:  (a)  Choose  nexp  =  1000;  a  =  ao  =  0,  a  —  <t0  =  1  and  for  N  =  1, 
N  =  2,  and  TV  =  10  display  both  X2  and  P(X2).  (b)  Repeat  (a),  keeping  a  =  0  and 
choosing  ciq  —  1  and  =  5.  Explain  the  shift  of  the  histogram  for  P(X2).  (c)  Repeat 
(a),  keeping  a  =  1  fixed  and  choosing  <t0  =  0.5  and  <t0  =  2.  Discuss  the  results,  (d) 
Modify  the  program  so  that  instead  of  a o  and  a(2,  the  sample  mean  x  sample  variance 
S2  are  used  for  the  computation  of  X2.  The  quantity  X2  can  be  used  for  a  x 2 -test 
of  the  hypothesis  that  the  samples  were  drawn  from  a  normal  distribution.  Display 
histograms  of  X2  and  P(X2)  and  show  that  X2  follows  a  x 2 -distribution  with  N  —  2 
degrees  of  freedom. 


9.  The  Method  of  Least  Squares 


The  method  of  least  squares  was  first  developed  by  LEGENDRE  and  GAUSS. 
In  the  simplest  case  it  consists  of  the  following  prescription: 

The  repeated  measurements  y j  can  be  treated  as  the  sum  of 
the  (unknown)  quantity  x  and  the  measurement  error  Sj , 

y  j  =x  +  £j  ■ 

The  quantity  x  should  be  determined  such  that  the  sum  of 
squares  of  the  errors  Sj  is  a  minimum, 

=  E(x_y/)2=min  • 

j  j 

We  will  see  that  in  many  cases  this  prescription  can  be  derived  as  a  result 
of  the  principle  of  maximum  likelihood,  which  historically  was  developed 
much  later,  but  that  in  other  cases  as  well,  it  provides  results  with  optimal 
properties.  The  method  of  least  squares,  which  is  the  most  widely  used  of  all 
statistical  methods,  can  also  be  used  in  the  case  where  the  measured  quantities 
y j  are  not  directly  related  to  the  unknown  x,  but  rather  indirectly,  i.e.,  as  a  lin¬ 
ear  (or  also  nonlinear)  combination  of  several  unknowns  xi,  X2,  ....  Because 
of  the  great  practical  significance  of  the  method  we  will  illustrate  the  various 
cases  individually  with  examples  before  turning  to  the  most  general  case. 

9.1  Direct  Measurements  of  Equal  or  Unequal  Accuracy 

The  simplest  case  of  direct  measurements  with  equal  accuracy  has  already 
been  mentioned.  Suppose  one  has  carried  out  n  measurements  of  an  unknown 
quantity  x .  The  measured  quantities  y j  have  measurement  errors  Sj .  We  now 
make  the  additional  assumption  that  these  are  normally  distributed  about  zero: 
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yj=x  +  £j  ,  E{Sj)=  0  ,  E{e))  =  <j2  .  (9.1.1) 


This  assumption  can  be  justified  in  many  cases  by  the  Central  Limit  Theorem. 
The  probability  to  observe  the  value  y  j  as  the  result  of  a  single  measurement 
is  thus  proportional  to 


The  log-likelihood  function  for  all  n  measurements  is  thus  (see  Example  7.2) 


( y;  —  x)2  +  const 

7  =  1 


The  maximum  likelihood  condition, 


is  thus  equivalent  to 


£  =  max 


(9.1.2) 


n  n 

M  =  y~^(y  j  —  x)2  =  T>?  =  min  .  (9.1.3) 

7=1  7=1 


This  is  exactly  the  least-squares  prescription.  As  we  have  shown  in  Exam¬ 
ples  7.2  and  7.6,  this  leads  to  the  result  that  the  best  estimator  for  x  is  given 
by  the  arithmetic  mean  of  the  y  j 


/  ““ 

x  =  y  = 


The  variance  of  this  estimator  is 


(9.1.4) 


a2(y)  =  cr2/n  ,  (9.1.5) 

or,  if  we  set  the  measurement  errors  and  standard  deviations  equal, 

Ax  =  Ay/^/n  .  (9.1.6) 

The  more  general  case  of  direct  measurements  of  different  accuracy  has 
also  already  been  treated  in  Example  7.6.  Let  us  assume  again  a  normal  dis¬ 
tribution  centered  about  zero  for  the  measurement  errors,  i.e., 

y  j=x  +  Sj  ,  E(Sj)=  0  ,  E(s2:)  =  <t?  =  l/gj  .  (9.1.7) 
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Comparing  with  (7.2.7)  gives  the  requirement  of  the  maximum-likelihood 
method 


«=E 

7  =  1 


(y  j-x)2 


n  n 

J2  8j  (y  j  -  *)2  =  I]  ‘?/£/  =  min 

7=1  7=1 


(9.1.8) 


The  individual  terms  in  the  sum  are  now  weighted  with  the  inverse  of  the 
variances.  The  best  estimator  for  x  is  then  [cf.  (7.2.8)] 


x  = 


£7=1  gjYi 

En 

7  =  1 


(9.1.9) 


i.e.,  the  weighted  mean  of  the  individual  measurements.  One  sees  that  an 
individual  measurement  thus  contributes  to  the  final  result  less  when  its  mea¬ 
surement  error  is  greater.  From  (7.3.20)  we  know  the  variance  of  jc;  it  is 


(9.1.10) 


We  can  use  the  result  (9.1.9)  in  order  to  compute  the  best  estimates  £  j  of  the 
original  measurement  errors  sj  from  (9.1.1)  to  obtain 


~ei  =  y  j-x 


We  expect  these  quantities  to  be  normally  distributed  about  zero  with  the  vari¬ 
ance  <7j.  That  is,  the  quantities  Sj /a j  should  follow  a  standard  Gaussian  dis¬ 
tribution.  According  to  Sect.  6.6,  the  sum  of  squares 


m=e 

7  =  1 


n 


E 


(y  j-x)2 


n 


J>j(yy-*)2 

7  =  1 


(9.1.11) 


then  follows  a  ^-distribution  with  n  —  1  degrees  of  freedom. 

This  property  of  the  quantity  M  can  now  be  used  to  carry  out  a  y2-test 
on  the  validity  of  the  assumption  (9.1.7).  If,  for  a  given  significance  level 
a,  the  quantity  M  exceeds  the  value  then  we  would  have  to  recheck 

the  assumption  (9.1.7).  Usually  one  does  not  doubt  that  the  y j  are  in  fact 
measurements  of  the  unknown  x.  It  may  be,  however,  that  the  errors  £j  are 
not  normally  distributed.  In  particular,  the  measurements  may  also  be  biased, 
i.e.,  the  expectation  value  of  the  errors  sj  may  be  different  from  zero.  The 
presence  of  such  systematic  errors  can  often  be  inferred  from  the  failure  of 
the  x2-test. 
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Example  9.1:  Weighted  mean  of  measurements  of  different  accuracy 

The  best  values  for  constants  of  fundamental  significance,  such  as  impor¬ 
tant  constants  of  nature,  are  usually  obtained  as  weighted  averages  of  mea¬ 
surements  obtained  by  different  experimental  groups.  For  the  properties  of 
elementary  particles,  such  mean  values  are  compiled  at  regular  intervals. 
We  will  consider  as  an  example  somewhat  older  measurements  of  the  mass  of 
the  neutral  K-meson  (K°),  taken  from  such  a  compilation  from  1967  [10]. 
An  average  was  computed  from  the  results  of  four  experiments,  all  car¬ 
ried  out  with  different  techniques.  The  calculation  can  be  carried  out  fol¬ 
lowing  the  scheme  of  Table  9.1.  The  resulting  value  of  M  is  7.2.  If  we 
choose  a  significance  level  of  5%,  we  find  from  Table  1.7  for  three  degrees 
of  freedom  Xq  95  =  7.82.  At  the  time  of  the  averaging,  one  could  therefore 
assume  that  the  result  mK 0  =  (497.9  ±  0.2)  MeV  represented  the  best  value 
for  the  mass  of  the  K-meson,  as  long  as  no  further  experiments  were  carried 
out.  (More  than  40  years  later  the  weighted  mean  of  all  measurements  was 
mK 0  =  (497.614  ±0.024)  MeV  [11]).  ■ 


Table  9.1:  Construction  of  the  weighted  mean  from  four  measurements  of  the  mass  of  the 
neutral  K  meson.  The  yj  are  the  measured  values  in  MeV. 


• 

J 

yj 

°j  V°j=gj 

yj  gj 

yj-x 

(yj-x)zgj 

1 

498.1 

0.4 

6.3 

3038.0 

0.2 

0.3 

2 

497.44 

0.33 

10 

4974.4 

-0.46 

2.1 

3 

498.9 

0.5 

4 

1995.6 

1.0 

4.0 

4 

497.44 

0.5 

4 

1989.8 

-0.46 

0.8 

E 

24.3 

11997.8 

7.2 

x  =  Hyjgj/Y,gj 

=  497.9 

,  Ax  — 

(Eg/r 

2  =  0.20 

Let  us  now  consider  the  case  where  the  x2-test  fails.  As  mentioned  above, 
one  usually  assumes  that  at  least  one  of  the  measurements  has  a  systematic 
error.  By  investigation  of  the  individual  measurements,  one  can  sometimes 
determine  that  one  or  two  measurements  deviate  from  the  others  by  a  large 
amount.  Such  a  case  is  illustrated  in  Fig.  9.1a,  where  several  different  mea¬ 
surements  are  shown  with  their  errors.  (The  measured  value  is  plotted  along 
the  vertical  axis;  the  horizontal  axis  merely  distinguishes  between  the  differ¬ 
ent  measurements.)  Although  the  x  2 -test  would  fail  if  all  measurements  from 
Fig.  9.  la  are  used,  this  is  not  the  case  if  measurements  4  and  6  are  excluded. 

Unfortunately  the  situation  is  not  always  so  clear.  Figure  9.1b  shows  a 
further  example  where  the  x2-test  fails.  (According  to  Chap.  8  one  would 
reject  the  hypothesis  that  the  measurements  are  determinations  of  the  same 
quantity.)  There  is  no  single  measurement,  however,  that  is  responsible  for 
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Fig.9.1:  Averaging  of  10  measurements  where  the  x2-test  fails,  (a)  Anomalous  deviation  of 
certain  measurements:  (7)  Averaging  of  all  measurements;  (2)  averaging  without  34  and  34 . 
(b)  Errors  of  individual  measurements  clearly  too  small:  (7)  Error  of  the  mean  according  to 
(9.1.10);  (2)  error  of  the  mean  according  to  (9.1.13). 


this  fact.  It  would  now  be  mathematically  correct  to  not  give  any  average 
value  at  all,  and  to  make  no  statement  about  a  best  value,  as  long  as  no  fur¬ 
ther  measurements  are  available.  In  practice,  this  is  clearly  not  satisfactory. 
Rosenfeld  et  al.  [10]  have  suggested  that  the  individual  measurement  er¬ 
rors  should  be  increased  by  a  scale  factor  «JWJ(n  —  1),  i.e.,  one  replaces  the 
aj  by 


a'j  =  a j 


M 


n  —  1 


(9.1.12) 


The  weighted  mean  x  obtained  by  using  these  measurement  errors  does  not 
differ  from  that  of  expression  (9.1.9).  The  variance,  however,  is  different 
from  (9.1.10).  It  becomes 


(9.1.13) 


We  now  compute  the  analogous  expression  to  (9.1.1 1), 
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n  —  1 

- M  =  n-  1 

M 


(9.1.14) 


This  is  the  expectation  value  of  / 2  for  n  —  1  degrees  of  freedom.  Equation 
(9.1.14)  clearly  provided  the  motivation  for  relation  (9.1.12).  We  repeat  that 
this  relation  has  no  rigorous  mathematical  basis.  It  should  be  used  with  cau¬ 
tion,  since  it  hides  the  influence  of  systematic  errors.  On  the  other  hand  it 
provides  reasonable  errors  for  the  mean  for  cases  such  as  those  in  Fig.  9.1b, 
while  direct  application  of  Eq.  (9.1.10)  leads  to  an  error  that  is  far  to  small  to 
reflect  the  actual  dispersion  of  the  individual  measurements  about  the  mean. 
Both  solutions  for  the  error  of  the  mean  are  shown  in  Fig.  9.  lb. 


9.2  Indirect  Measurements:  Linear  Case 

Let  us  now  consider  the  more  general  case  of  several  unknown  quantities 
Xj  (i  =  1, 2, . . . ,  r).  The  unknowns  are  often  not  measured  directly.  Instead, 
only  a  set  of  linear  functions  of  the  Xj  are  measurable, 


Pj  =  P  j0  +  Pj\X\+  Pj2*2  4 - b  PjrXr  ■  (9.2.1) 

We  now  write  this  relation  in  a  somewhat  different  form, 


fj  =  rjj  +  dj 0  +  Clj\X\  +  dj 2X2  H - b  djrxr  =  0  .  (9.2.2) 

We  define  a  column  vector, 


(  dji  \ 

aj2 

\  ajr  j 


(9.2.3) 


and  write  (9.2.2)  in  the  more  compact  form, 


fj  —  *1  j  +  d  jo  +  ajx  —  0 
If  we  define  in  addition 


(9.2.4) 


(  m  \ 

(  d\Q  \ 

(  dn 

T|  = 

m 

• 

• 

• 

»  ao  = 

<320 

• 

• 

• 

,  A  = 

<321 

• 

• 

• 

1 y 

y  &n0  j 

y  &nl 

dn 

<^22 


&n2 


d\ r  \ 

d2r 


(9.2.5) 


/ 
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then  the  system  of  equations  (9.2.4)  can  be  written  as  a  matrix  equation, 

f=m  +  ao  +  Ax  =  0  •  (9.2.6) 

Of  course  the  measured  quantities  still  have  measurement  errors  Sj ,  which  we 
assume  to  be  normally  distributed.  We  then  have* 


yj 

-  Vj  +  £j  . 

Eisj ) 

=  0  , 

(9.2.7) 

E(ej) 

=  <Tj=l/gj  ■ 

Since  the  yj  are  independent  measurements,  we  can  arrange  the  variances 
cr j  in  a  diagonal  covariance  matrix  for  yj  or  Sj , 


(9.2.8) 


In  analogy  to  (9.1.7)  we  call  the  inverse  of  the  covariance  matrix  a  weight 
matrix. 


Gy  =  Ge  =  C~l  =  C~l  = 

(  gi 

g2 

0  > 

• 

• 

• 

\  0 

Sn  J 

(9.2.9) 


If  we  now  put  the  measurements  and  errors  together  into  vectors,  we  obtain 
from  (9.2.7) 

y  =  r|  +  e  .  (9.2.10) 


From  (9.2.6)  one  then  has 


y-e  +  a0  +  Ax  =  0 


(9.2.11) 


We  want  to  solve  this  system  of  equations  for  the  unknowns  x  with  the 
maximum-likelihood  method.  With  our  assumption  (9.2.7),  the  measurements 
yj  are  normally  distributed  with  the  probability  density 


./Tv,/) 


( yj-hj)2\ 

2a]  ) 


(9.2.12) 


*For  simplicity  of  notation  we  no  longer  write  random  variables  with  a  special  character 
type.  From  context  it  will  always  be  evident  which  variables  are  random. 
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Thus  for  all  measurements  one  obtains  likelihood  functions 


n 


n 


L  =  Unyj)  =  (2^m 

7=1  \7=1 


-t1  hr  Ht  4 


(9.2.13) 


(n  \  i  n 

n 1 1 — 

7  =  1  /  2  7  =  1 

This  expression  is  clearly  a  maximum  when 


f£ 
r=1  aJ 


(9.2.14) 


n  „ 2  n  i  i  „  \2 


87 


M  =  EA  =  E 

7  =  1 


7  =  1 


(>’/  +  ayX  +  «/0) 


=  min 


(9.2.15) 


Using  (9.2.9)  and  (9.2.1 1),  we  can  rewrite  this  expression  as 

M  —  &tGvS  —  min 


(9.2.16) 


or 


M  —  (y  +  a0  +  Ax)iG3,(y  +  a0  +  Ax)  =  min  , 
or  in  abbreviated  form 


(9.2.17) 


c  =  y  +  a0  ,  (9.2.18) 

M  =  (c  +  Ax)tGj,(c  +  Ax)  =  min  .  (9.2.19) 

We  will  simplify  this  expression  further  by  using  the  Cholesky  decomposition 
(cf.  Sect.  A.9)  of  the  positive-definite  symmetric  weight  matrix  Gy, 


Gy  =  H  H 


(9.2.20) 


In  the  frequently  occurring  case  of  uncorrelated  measurements  (9.2.9)  one  has 

/  1M  0  \ 


H  —  Hl  = 


l  M 


0 


(9.2.21) 


1  /°n  ) 


Using  the  notation 


c'  =  He 


A'  =  HA 


(9.2.22) 


Eq.  (9.2.19)  takes  on  the  simple  form 

M  =  (A'x  +  c/)2  =  min  . 


(9.2.23) 


The  method  for  solving  this  equation  for  x  is  described  in  detail  in  Ap¬ 
pendix  A,  in  particular  in  Sects.  A.5  through  A.  14.  The  solution  can  be  written 
in  the  form 


x  =  —  A/+c/ 


(9.2.24) 
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[See  (A.  10.3).]  Here  A'+  is  the  pseudo-inverse  of  the  matrix  A'  (see 
Sect.  A.  10).  In  Sect.  A.  14,  a  solution  of  the  form  of  (9.2.24)  is  indeed  found 
with  the  help  of  the  singular  value  decomposition  of  the  matrix  A'.  This  pro¬ 
cedure  is  particularly  accurate  numerically. 

To  compute  by  hand  one  uses  instead  of  Eq.  (9.2.24)  the  mathematically 
equivalent  expression  for  the  solution  of  the  normal  equations  [see  (A.5.17)], 

x  =  —  (A/TA/)_1A/Tc/  (9.2.25) 

or,  with  (9.2.22)  in  terms  of  the  quantities  c  and  A, 

x  =  -(ATGyA)_1ATG>,c  .  (9.2.26) 


The  solution  includes,  of  course,  the  special  case  of  Sect.  9.1.  In  the  case  of 
direct  measurements  of  different  accuracy,  x  has  only  one  element,  ao  van¬ 
ishes,  and  A  is  simply  an  n-component  column  vector  whose  elements  are  all 
equal  to  —  1 .  One  then  has 


(  yiM  \ 

yi/ai 

c  = 

yn/^n  ) 


and  (9.2.25)  becomes 


A!  — 


-1M  \ 

-1 1 02 

\  I  /or„  J 


n 


7=1  aJ 


X  = 


-1 


n 


i-i  ai 


E 


which  is  identical  to  (9.1.9). 

The  solution  of  (9.2.26)  represents  a  linear  relation  between  the  solution 
vector  x  and  the  vector  of  measurements  y,  since  c  =  y  +  ao-  We  can  thus 
apply  the  error  propagation  techniques  of  Sect.  3.8.  Using  (3.8.2)  and  (3.8.4) 
one  immediately  obtains 

Cx  =  G;1  -  [(ATGyA)-1ATGy]G-1[(ATGyA)-lATGy]T  . 

The  matrices  Gy,  G~l,  and  ( ATGVA)  are  symmetric,  i.e.,  they  are  identical  to 
their  transposed  matrices.  Using  the  rule  (A.  1.8),  this  expression  simplifies  to 

G71  =  ( ATGyA)-lATGyGylGyA(ATGyA r1 

=  ( ATGyAr1(ATGyA)(ATGyA r1  , 

G71  =  (AtG3,A)_1  =  (A/TA')_1 


(9.2.27) 
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We  have  thus  obtained  a  simple  expression  for  the  covariance  matrix  of  the 
estimators  x  for  the  unknowns  x.  The  square  roots  of  the  diagonal  elements  of 
this  matrix  can  be  viewed  as  “measurement  errors”,  although  the  quantities  x 
were  not  directly  measured. 

We  can  also  use  the  result  (9.2.26)  to  improve  the  original  measure¬ 
ments  y.  By  substituting  (9.2.26)  into  (9.2.11)  one  obtains  a  vector  of  esti¬ 
mators  of  the  measurement  errors  e, 

i  =  Ax  +  c  =  -A(ATGyA)-1ATG,,c-l-c  .  (9.2.28) 

These  measurement  errors  can  now  be  used  to  compute  improved  measured 
values, 


rj  =  y  —  e  =  y  +  A(ATGyA)  lATGyc  —  c  , 

x\  =  A(ATGyA)_1ATG3,c-a0  .  (9.2.29) 

The  q  are  again  linear  in  y.  We  can  again  use  error  propagation  to  determine 
the  covariance  matrix  of  the  improved  measurements, 

GT1  =  [A(ATGyAr1ATGy]G^1[A(ATGyAr1ATGy]r  , 

GT1  =  A(ATGyA)~1AT  =  AG^lAT  .  (9.2.30) 

The  improved  measurements  rj  satisfy  (9.2.1)  if  the  unknowns  are  replaced  by 
their  estimators  x . 


9.3  Fitting  a  Straight  Line 

We  will  examine  in  detail  the  simple  but  in  practice  frequently  occurring  task 
of  fitting  a  straight  line  to  a  set  of  measurements  yj  at  various  values  tj  of  a 
so-called  controlled  variable  t.  The  values  of  these  variables  will  be  assumed 
to  be  known  exactly,  i.e.,  without  error.  The  variable  tj  could  be,  for  example, 
the  time  at  which  an  observation  y,  is  made,  or  a  temperature  or  voltage  that  is 
set  in  an  experiment.  (If  t  also  has  an  error,  then  fitting  a  line  in  the  (t,  y )  plane 
becomes  a  nonlinear  problem.  It  will  be  treated  in  Sect.  9.10,  Example  9.11.) 

In  the  present  case  the  relation  (9.2.1)  has  the  simple  form 

Vj  =  yj  ~  Sj  =Xi+X2 tj 


or  using  vector  notation, 


r)  —  x\  —  X2t  —  0 
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We  will  attempt  to  determine  the  unknown  parameters 


x  — 


X\ 

*2. 


from  the  measurements  in  Table  9.2. 

A  comparison  of  our  problem  to  Eqs.  (9.2.2)  through  (9.2.6)  gives  ao  =  0, 


A  =  - 


1  fl  \ 

1  °) 

14  \ 

1  t2 

1  1 

1.5 

1  <3 

— 

1  2 

,  y  =  c  = 

3.7 

\1  U 

(l  3  ) 

(4.1  / 

The  matrices  Gy  and  H  are  found  by  substitution  of  the  last  line  of  Table  9.2 
into  (9.2.9)  and  (9.2.21), 


/4 


Gy  = 


25 


\ 


1 


H  = 


4/ 


/2  \ 

5 

1 

V  2  J 


One  thus  has 


A'  —  — 


2 
5 

1 

V2 


o  \ 

5 
2 

6/ 


c  = 


/ 2-8  \ 
7.5 

3.7 

V  8.2  / 


A'V 


'63.2' 

94.1 


(A/TA')_1 


1 


34  39 
39  65 


-l 


65  -39 
689  V  -39  34 


0.0943  -0.0556 
-0.0566  0.0493 


To  invert  the  2x2  matrix  we  use  (A.6.8).  The  solution  (9.2.25)  is  then 


x  = 


0.0943  -0.0566 
-0.0566  0.0493 


63.2' 

94.1 


0.636 

1.066 


The  covariance  matrix  of  for  x  is 


Cx  =  G2l  =  (A/TA')_1 


0.0943  -0.0566 
-0.0566  0.0494 


Its  diagonal  elements  are  the  variances  of  x\  and  X2,  and  their  square  roots  are 
the  errors, 


Ax  i  =  0.307 


Ax  2  =  0.222 
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Table  9.2:  Data  for  fitting  a  straight  line. 


• 

J 

i 

2 

3 

4 

0.0 

1.0 

2.0 

3.0 

yj 

1.4 

1.5 

3.7 

4.1 

ai 

0.5 

0.2 

1.0 

0.5 

For  the  correlation  coefficient  between  x\  and  xo  one  finds 


-0.0566 
0.307  •  0.222 


-0.830 


The  improved  measurements  are 


/  1  °  \ 

1  1 

a  JL  JL 

>I  =  -/1X=  ,  2 

V  1  3  / 


/0.636\ 
VI. 066/ 


/  0.636  \ 
1.702 
2.768 
\  3.834  j 


They  lie  on  a  line  given  by  rj  =  —Ax,  which  of  course  is  different  in  gen¬ 
eral  from  the  “true”  solution.  The  “residual  errors”  of  rj  can  be  obtained 
with  (9.2.30), 


1 

1 

1 

( 

0.0943 

-0.0566 

1 

2 

l 

-0.0566 

0.0493 

1 

3  ) 

/ 

0.0943 

0.0377 

-0.0189 

0.0377 

0.0305 

0.0232 

0.0189 

0.0232 

0.0653 

v- 

■0.0755 

0.0160 

0.1074 

1111 
0  12  3 


-0.0755  \ 
0.0160 
0.1074 
0.1988  J 


The  square  roots  of  the  diagonal  elements  are 


Arj\  —  0.31  ,  Arj2  —  0.17  ,  Ai)3  =  0.26  ,  Ai)4  =  0.45 


The  fit  procedure,  in  which  more  measurements  (four)  were  used  than  were 
necessary  to  determine  the  two  unknowns,  has  noticeably  reduced  the  indi¬ 
vidual  errors  of  the  measurements  in  comparison  to  the  original  values  Oj . 
Finally  we  will  compute  the  value  of  the  minimum  function  (9.2.16), 

M  —  eTGye  =  (y  -  r\)TGy(y  —  rj)  = 

\;'=i  °} 


=  4.507 
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With  this  result  we  can  carry  out  a  x  2-test  on  the  goodness-of-fit  of  a  straight 
line  to  our  data.  Since  we  started  with  n  =  4  measured  values  and  have  deter¬ 
mined  r  —2  unknown  parameters,  we  still  have  n—r  —  2  degrees  of  freedom 
available  (cf.  Sect.  9.7).  If  we  choose  a  significance  level  of  5  %,  we  find  from 
Table  1.7  for  two  degrees  of  freedom  X095  —  5.99.  There  is  thus  no  reason  to 
reject  the  hypothesis  of  a  straight  line. 

The  results  of  the  fit  are  shown  in  Fig.  9.2.  The  measurements  yj  are 
shown  as  functions  of  the  variables  t.  The  vertical  bars  give  the  measure¬ 
ment  errors.  They  cover  the  range  yj  ±  a j .  The  plotted  line  corresponds  to  the 
result  xi,  X2-  The  improved  measurements  lie  on  this  line.  They  are  shown 
in  Fig.  9.2b  together  with  the  residual  errors  Arjj.  In  order  to  illustrate  the 
accuracy  of  the  estimates  x\,  X2,  we  consider  the  covariance  matrix  CA~.  It 
determines  a  covariance  ellipse  (Sect.  5.10)  in  a  plane  spanned  by  the  vari¬ 
ables  xi,  X2.  This  ellipse  is  shown  in  Fig.  9.2c.  Points  on  the  ellipse  cor¬ 
respond  to  fits  of  equal  probability.  Each  of  these  points  determines  a  line 
in  the  (t,y)  plane.  Some  of  the  points  are  indicated  in  Fig.  9.2c  and  the 
corresponding  lines  are  plotted  in  Fig.  9.2d.  The  points  on  the  covariance 
ellipse  thus  correspond  to  a  bundle  of  lines.  The  line  determined  by  the  “true” 


x2 

A 


>  x1 


(c) 


y 

A 


y 

a 


(b) 


Fig.  9.2  :  Fit  of  a  straight  line  to  data  from  Table  9.2.  (a)  Original  measured  values  and  er¬ 
rors;  (b)  improved  measurement  values  and  residual  errors;  (c)  covariance  ellipse  for  the 
fitted  quantities  x\,  (d)  various  lines  corresponding  to  individual  points  on  the  covariance 

ellipse. 
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values  of  the  unknowns  lies  in  this  bundle  with  the  probability  1  —  e  l/’2; 
cf.  (5.10.18). 


9.4  Algorithms  for  Fitting  Linear  Functions 
of  the  Unknowns 

The  starting  point  of  the  ideas  in  Sect.  9.2  was  the  assumption  (9.2.6)  of  a 
linear  relationship  between  the  “true”  values  q  of  the  measured  quantities  y 
and  the  unknowns  x.  We  will  write  this  relation  in  the  form 

q  =  h(x)  =  —  ao  —  Ax  ,  (9.4.1) 

or  in  terms  of  components, 

t]j  =  hj  (x)  =  —aoj  -  Aj\x\  -  Aj2X2 - Ajrxr  .  (9.4.2) 

Often  it  is  useful  to  consider  the  index  j  as  specifying  that  the  measurement 
yj  corresponds  to  a  value  tj  of  a  controlled  variable,  which  is  taken  to  be 
known  without  error.  Then  (9.4.2)  can  be  written  in  the  form 


rjj=h(x,tj )  .  (9.4.3) 

This  relation  describes  a  curve  in  the  (t,  i])  plane.  It  is  characterized  by  the 
parameters  x.  The  determination  of  the  parameters  is  thus  equivalent  to  fitting 
a  curve  to  the  measurements  yj  =  y(tj  ).  Usually  the  individual  measurements 
yj  are  uncorrelated.  The  weight  matrix  Gy  is  diagonal;  it  has  the  Cholesky 
decomposition  (9.2.20).  One  then  has  simply 

A'jk  =  Ajk/°j  ,  c!j=c/crj  , 


see  (9.2.22). 


9.4.1  Fitting  a  Polynomial 

As  a  particularly  simple  but  useful  example  of  a  function  for  (9.4.3),  let  us 
consider  the  relation 

tj  j  —  hj  —  x\  X2  tj  T  X2,t^j  •  •  •  T  Xftj  .  (9.4.4) 

This  is  a  polynomial  in  tj ,  but  is  linear  in  the  unknowns  x.  The  special  case 
r  =  2  has  been  treated  in  detail  in  Sect.  9.3. 
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A  comparison  of  (9.4.4)  with  (9.4.2)  yields  directly 


aoj  -  0  , 


a  —  t£~ 1 

Ajl  -  ~t  : 


or  more  completely, 


a0  = 


/  0  \ 
0 

w 


/  1  u  fj2  ...  t[  1  \ 


A  =  - 


1  ?2  t 


2 

2 


t 


r  —  1 


\  1  t  t ^  A- 1  / 

\  r  in  ln  ■  ■  ■  ln  / 


The  class  LsqPol  performs  the  fit  of  a  polynomial  to  data. 


Example  9.2:  Fitting  of  various  polynomials 

It  is  often  interesting  to  fit  polynomials  of  various  orders  to  measured  data,  as 
long  as  the  number  of  degrees  of  freedom  of  the  fit  is  at  least  one,  i.e.,  n  > 
i  +  1 .  As  a  numerical  example  we  will  use  measurements  from  an  elementary 
particle  physics  experiment.  Consider  an  investigation  of  elastic  scattering  of 
negative  K  mesons  on  protons  with  a  fixed  K  meson  energy.  The  distribution 
of  the  cosine  of  the  scattering  angle  0  in  the  center-of-mass  system  of  the 
collision  is  characteristic  of  the  angular  momentum  of  possible  intermediate 
states  in  the  collision  process.  If,  in  particular,  the  distribution  is  considered  as 
a  polynomial  in  cos  0 ,  the  order  of  the  polynomial  can  be  used  to  determine 
the  spin  quantum  numbers  of  such  intermediate  states. 

The  measured  values  yj  ( j  —  1,2,...,  10)  are  simply  the  numbers  of  col¬ 
lisions  for  which  cos  0  was  observed  in  a  small  interval  around  tj  =  cos  0 j . 
As  measurement  errors  the  statistical  errors  were  used,  i.e.,  the  square  roots 
of  the  number  of  observations.  The  data  are  given  in  Table  9.3.  The  results 
of  the  fit  of  polynomials  of  various  orders  are  summarized  in  Table  9.4  and 
Fig.  9.3.  With  the  x2-test  we  can  check  successively  whether  a  polynomial  of 
order  zero,  one, . . .  gives  a  good  fit  to  the  data. 

We  see  that  the  first  two  hypotheses  (a  constant  and  a  straight  line)  are 
not  in  agreement  with  the  experimental  data.  This  can  be  seen  in  Fig.  9.3  and 
is  also  reflected  in  the  values  of  the  minimum  function.  The  hypothesis  r  —  3, 
a  second-order  polynomial,  gives  qualitative  agreement.  Most  of  the  measure¬ 
ments,  however,  do  not  fall  on  the  fitted  parabola  within  the  error  bars.  The 
X2-test  fails  with  a  significance  level  of  0.0001.  For  the  hypotheses  r  =  4,  5, 
and  6,  however,  the  agreement  is  very  good.  The  fitted  curves  go  through  the 
error  bars  and  are  almost  identical.  The  x2-test  does  not  call  for  a  rejection  of 
the  hypothesis  even  at  a  =  0.5.  We  can  therefore  conclude  that  a  third  order 
polynomial  is  sufficient  to  describe  the  data.  An  even  more  careful  investiga¬ 
tion  of  the  question  as  to  what  order  a  polynomial  must  be  used  to  describe 
the  data  is  possible  with  orthogonal  polynomials;  cf.  Sect.  12.1.  ■ 


224 


9  The  Method  of  Least  Squares 


Table  9.3:  Data  for  Example  9.2.  One  has  oj  =  Ay]. 


• 

J 

tj  =  COS  0 j 

yj 

1 

-0.9 

81 

2 

-0.7 

50 

3 

-0.5 

35 

4 

-0.3 

27 

5 

-0.1 

26 

6 

0.1 

60 

7 

0.3 

106 

8 

0.5 

189 

9 

0.7 

318 

10 

0.9 

520 

Table9.4:  Summary  of  results  from  Example  9.2  ( n  =  10  measured  points,  r  parameters, 
/  =  n  —  r  degrees  of  freedom). 


r 

X\ 

X2 

*3 

X\ 

X5 

X6 

/ 

M 

i 

57.85 

9 

833.55 

2 

82.66 

99.10 

8 

585.45 

3 

47.27 

185.96 

273.61 

7 

36.41 

4 

37.94 

126.55 

312.02 

137.59 

6 

2.85 

5 

39.62 

119.10 

276.49 

151.91 

52.60 

5 

1.68 

6 

39.88 

121.39 

273.19 

136.58 

56.90 

16.72 

4 

1.66 

9.4.2  Fit  of  an  Arbitrary  Linear  Function 

The  matrix  A  and  the  vector  c  enter  into  the  solution  of  the  problem  of 
Sect.  9.2.  They  depend  on  the  form  of  the  function  to  be  fitted  and  must  there¬ 
fore  be  provided  by  the  user.  (In  Sect.  9.4. 1  the  function  was  known,  so  the 
user  did  not  have  to  worry  about  computing  A  and  c.)  The  class  LsqLi  n 
performs  the  fit  of  an  arbitrary  linear  function  to  data. 


Example  9.3:  Fitting  a  proportional  relation 

Suppose  from  the  construction  of  an  experiment  it  is  known  that  the  true  value 
)]  j  of  the  measurement  yj  is  directly  proportional  to  the  value  of  the  controlled 
variable  tj : 


rij=xltj 


9.4  Algorithms  for  Fitting  Linear  Functionsof  the  Unknowns 


225 


y 

A 


y 

A 


y 

a 


-1 


o 


->  i 


0 


y 

a 


y 

A 


y 

a 


600 

500 

400 

300 

200 

100 

0 


■>  t 


n  =  10,  r  =  6,  M  = 


H>  i 


0 


H>  t 


Fig.  9.3  :  Fit  of  polynomials  of  various  orders  (0, 1 , . . . ,  5)  to  the  data  from  Example  9.2. 


The  constant  of  proportionality  x\  is  to  be  determined  from  the  measurements. 
This  relation  is  simpler  than  a  first-order  polynomial,  which  contains  two  con¬ 
stants.  A  comparison  with  (9.4.2)  gives 


and  thus 


/  Vi  ^ 

\yn  ) 


(h  \ 

A  —  —  : 

\ tn ) 


Shown  in  Fig.  9.4  are  the  results  of  the  fit  of  a  line  through  the  origin, 
i.e.,  a  proportionality,  and  the  fit  of  a  first-order  polynomial.  The  value  of  the 
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Fig.  9.4  :  Fit  of  a  proportional  relation  (above)  and  a  first-order  polynomial  (below)  to  data. 


minimum  function  is  clearly  smaller  in  the  case  of  the  general  line,  and  the  fit 
is  visibly  better.  The  number  of  degrees  of  freedom,  however,  is  less,  and  the 
desired  constant  of  proportionality  is  not  determined.  ■ 

9.5  Indirect  Measurements:  Nonlinear  Case 

If  the  relation  between  the  n  -vector  q  of  true  values  of  the  measured  quantities 
y  and  the  r-vector  of  unknowns  is  given  by  a  function, 


q  =  h(x)  , 
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that  is  not  -  as  (9.4.1)  -  linear  in  x,  then  our  previous  procedure  fails  in  the 
determination  of  the  unknowns. 

Instead  of  (9.2.2)  one  has 


fj(x,x\)  =  rij-hj(x)=  0  , 


(9.5.1) 


or  in  vector  notation, 


f(x,  r| )  =  0 


(9.5.2) 


We  can,  however,  relate  this  situation  to  the  linear  case  if  we  expand  the 
fj  in  a  Taylor  series  and  keep  only  the  first  term.  We  carry  out  the  expansion 
about  the  point  xo  =  (,v'io,  X20 , . . . ,  xro),  which  is  a  first  approximation  for  the 
unknowns,  which  has  been  obtained  in  some  way, 


//(x,*l)  =  //(x0,m)  + 


J 


dx\ 


(x\  —  xio)  H - h 


xo 


dfj 


dx 


(Xr 


r  /  XQ 


If  we  now  define 


£  =  X  -  X0  = 


ajl  = 


Vj 


dh 


j 


(  XI  -JC10  \ 

X2  ~  X20 


\  Xr  XrQ  J 

(  ^11  ^12 
«21  «22 


3  xi 


xo 


dx£ 


A  — 


xo 


CLlr 

Cl2r 


\ 


Cj  =  fj  (x0,  y  )  =  yj-h  j  (x0) 


y  (Tfjl  '  ‘  '  &nr  J 

Cl  \ 

C2 


c  = 


\cn  ) 


and  use  the  relation  (9.2.10),  then  we  obtain 


Wo) 

(9.5.3) 


(9.5.4) 


,  (9.5.5) 


(9.5.6) 


fj  (x0,  t))  =  fj  (xo,  y  -  s)  =  fj  (xo,  y)  -  e  . 


j 


j 


(9.5.7) 


Thus  we  can  now  write  the  system  of  equations  (9.5.2)  in  the  form 

f=Aij+c  —  e  =  0  ,  e  =  Aij+c  . 


(9.5.8) 


The  least-squares  condition  (9.2.16)  is  then 


M  —  (c  +  A^)TGy(c  +  Aij)  =  min 


(9.5.9) 
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in  complete  analogy  to  (9.2.19).  Using  the  quantities  defined  in  (9.2.20) 
through  (9.2.22)  we  can  obtain  the  solution  directly  from  (9.2.24), 

i  =  -A'V  .  (9.5.10) 

Corresponding  to  (9.5.4)  one  can  use  ij  to  find  a  better  approximation 

xi=x0  +  £  (9.5.11) 

and  compute  new  values  of  A  and  c  at  the  point  xi.  A  solution  ij  (9.5.10) 

/■w 

for  (9.5.9)  with  these  values  gives  X2  =  xi  +  ij ,  etc.  This  iterative  procedure  can 
be  terminated  when  the  minimum  function  in  the  last  step  has  not  significantly 
decreased  in  comparison  to  the  result  of  the  previous  step. 

There  is  no  guaranty,  however,  for  the  convergence  of  this  procedure. 
Heuristically  it  is  clear,  however,  that  the  expectation  for  convergence  is 
greater  when  the  Taylor  series,  truncated  after  the  first  term,  is  a  good  app¬ 
roximation  in  the  region  over  which  x  is  varied  in  the  procedure.  This  is  at 
least  the  region  between  xo  and  the  solution  x.  (Intermediate  steps  can  also  lie 
outside  of  this  region.)  It  is  therefore  important,  particularly  in  highly  nonlin¬ 
ear  problems,  to  start  from  a  good  first  approximation. 

/■w 

If  the  solution  x  =  xn  —  xn_i  +  ij  is  reached  in  n  steps,  then  it  can  be 

/■w 

expressed  as  a  linear  function  of  ij.  Using  error  propagation  one  finds  that  the 
covariance  matrices  of  x  and  ij  are  then  identical  and  one  finds  that 

C-x  =  G~l  =  (AtGvA)_1  =  (A/TAr)_1  •  (9.5.12) 

The  covariance  matrix  loses  its  validity,  however,  if  the  linear  approxima¬ 
tion  (9.5.3)  is  not  a  good  description  in  the  region  x;-  ±  At/,  i  =  1, . . . ,  r. 
(Here  Axi  =  *>J Ca .) 


9.6  Algorithms  for  Fitting  Nonlinear  Functions 

It  is  sometimes  useful  to  set  one  or  several  of  the  r  unknown  parameters  equal 
to  given  values,  i.e.,  to  treat  them  as  constants  and  not  as  adjustable  param¬ 
eters.  This  can  clearly  be  done  by  means  of  a  corresponding  definition  of  f 
in  (9.5.1).  For  the  user,  however,  it  is  more  convenient  to  write  only  one  sub¬ 
program  that  computes  the  function  f  for  the  r  original  parameters  and  when 
needed,  to  communicate  to  the  program  that  the  number  of  adjustable  param¬ 
eters  is  to  be  reduced  from  r  to  r' .  Of  course  a  list  with  r  elements  must  also 
be  given,  in  which  one  specifies  which  parameters  x/  are  to  be  held  constant 
(li  =  0)  and  which  should  remain  adjustable  (£/  =  1). 

Two  more  difficulties  come  up  when  implementing  the  considerations  of 
the  previous  section  in  a  program: 
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On  the  one  hand,  the  elements  of  the  matrix  A  must  be  found  by 
constructing  the  function  to  be  fitted  and  differentiating  it  with  respect  to  the 
parameters.  Of  course  it  is  particularly  convenient  for  the  user  if  he  does  not 
have  to  program  these  derivatives  himself,  but  rather  can  turn  this  task  over  to 
a  routine  for  numerical  differentiation.  In  Sect.  E.  1  we  provide  such  a  subrou¬ 
tine,  which  we  will  call  from  the  program  discussed  below.  Numerical  differ¬ 
entiation  implies,  however,  a  loss  of  precision  and  an  increase  in  computing 
time.  In  addition,  the  method  can  fail.  Our  programs  communicate  such  an 
occurrence  by  means  of  an  output  parameter.  Thus  in  some  cases  the  user  will 
be  forced  to  program  the  derivatives  by  hand. 

The  second  difficulty  is  related  to  the  fact  that  the  minimum  function 


M  =  (y-h(x))TGy(y-h(x)) 


(9.6.1) 


is  no  longer  a  simple  quadratic  form  of  the  unknowns  like  (9.2. 17)  and  (9.2.23). 
One  consequence  of  this  is  that  the  position  of  the  minimum  x  cannot  be 
reached  in  a  single  step.  In  addition,  the  convergence  of  the  iterative  proce¬ 
dure  strongly  depends  on  whether  the  first  approximation  xo  is  in  a  region 
where  the  minimum  function  is  sufficiently  similar  to  a  quadratic  form.  The 
determination  of  a  good  first  approximation  must  be  handled  according  to  the 
given  problem.  Some  examples  are  given  below.  If  one  constructs  the  iterative 
procedure  as  indicated  in  the  previous  section,  then  it  can  easily  happen  that 
the  minimum  function  does  not  decrease  with  every  step.  In  order  to  ensure 
convergence  despite  this,  two  methods  can  be  applied,  which  are  described  in 
Sects.  9.6.1  and  9.6.2.  The  first  (reduction  of  step  size)  is  simpler  and  faster. 
The  second  (the  Marquardt  procedure)  has,  however,  a  larger  region  of  con¬ 
vergence.  We  give  programs  for  both  methods,  but  recommend  applying  Mar¬ 
quardt  procedure  in  cases  of  doubt. 

9.6.1  Iteration  with  Step-Size  Reduction 

As  mentioned,  the  inequality 


M(Xi)  =  M(Xi- 1  +ij)  <  Af(x,-_ i) 


(9.6.2) 


does  not  hold  in  every  case  for  the  result  x(-  of  step  i.  The  following  con¬ 
sideration  helps  us  in  handling  such  steps.  Let  us  consider  the  expression 
M(x;_i  +s%)  as  a  function  of  the  quantity  5  with  0  <  s  <  1.  If  we  replace 
i*  by  si*  in  (9.5.9),  then  we  obtain 


M  =  (c  +  sAi)TGy(c  +  sA^)  =  (c'  +  sA'i)2 
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Differentiating  with  respect  to  s  gives 


M'  —  2(c'  +  s  A' Af\ 
or  with  c'  =  —A'\,  cf.  (9.5.10), 

M' =  2(s  -  i)f  A,T  A%  , 


and  thus 

M\s  =  0)  <  0  , 

if  A/TA'  =  ATGyA  is  positive  definite.  (This  is  always  the  case  near  the  min¬ 
imum.)  The  matrix  A'TA'  gives  the  curvature  of  the  function  M  in  the  space 
spanned  by  the  unknowns  x\ , . . . ,  xr.  For  only  one  such  unknown,  the  region 
of  positive  curvature  is  the  region  between  the  points  of  inflection  around  the 
minimum  (cf.  Fig.  10.1).  Since  the  function  M  is  continuous  in  v,  there  exists 
a  value  A  >  0  such  that 

M'(s )  <0  ,  0  <  s  <  X 

After  an  iteration  i  +  1,  for  which  (9.6.2)  does  not  hold,  one  multiplies  1* 
by  a  number  s,  e.g.,  s  =  1/2,  and  checks  whether 


M(x{- 1  +5^)  <  M(xi-i) 


If  this  is  the  case,  then  one  sets  x;  =  x,  _i  -Kvi-.  If  it  is  not  the  case,  then  one 
multiplies  again  with  s,  and  so  forth. 

The  class  LsqNon  operates  according  to  this  procedure  of  iteration  with 
step-size  reduction.  As  in  the  linear  case  we  consider  the  measurements  as 
being  dependent  on  a  controlled  variable  t,  i.e.,  yj  —  yjitj).  For  the  true  values 
i)  j  corresponding  to  the  measurements  one  has 


rjj  =  h(x,  tj) 


or  [cf.  (9.6.1)] 


r)  =  h(x,  t) 


This  function  has  to  be  programmed  by  the  user.  That  is  done  within  an  exten¬ 
sion  of  the  abstract  class  DataUserFunction,  see  the  example  programs 
in  Sect.  9.14.  The  matrix  A  is  computed  by  numerical  differentiation  with  the 
class  AuxDri.  Should  the  accuracy  of  that  method  not  suffice,  the  user  has 
to  provide  a  method  with  the  same  name  and  the  same  method  declarations, 
computing  the  matrix  A  by  analytic  differentiation. 
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Fig.  9.5  :  Measured  values  and  fitted  Gaussian  curve  ( above );  logarithm  of  the  measured  val¬ 
ues  (below). 


Example  9.4:  Fitting  a  Gaussian  curve 

In  many  experiments  one  has  signals  y(t)  that  have  the  form  of  a  Gaussian 
curve, 

y(t)  —  xi  exp(—  (t  —  V2)2/2x|)  .  (9.6.3) 

One  wishes  to  determine  the  parameters  x\,  xo ,  V3  that  give  the  amplitude, 
position  of  the  maximum,  and  the  width  of  the  signal. 

Figure  9.5  shows  the  result  of  the  fit  to  data.  The  values  x\  —0.5,X2  —  1.5, 
X3  =  0.2  were  used  as  a  first  approximation.  In  fact  we  could  have  estimated 
significantly  better  initial  values  directly  from  the  plot  of  the  measurements, 
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i.e.,  xi  ~  1  for  the  amplitude,  xo  ~  1 .25  for  the  position  of  the  maximum,  and 
X3  ~  0.4  for  the  width. 

Here  we  have  just  mentioned  a  particularly  useful  aid  in  determining  the 
initial  approximations:  the  analysis  of  a  plot  of  the  data  by  eye.  One  can  also 
proceed  more  formally,  however,  and  consider  the  logarithm  of  (9.6.3), 


This  is  a  polynomial  in  t ,  which  is  linear  in  the  three  terms  in  parentheses, 


a\  =  lnxi 


By  taking  the  logarithm,  however,  the  distribution  for  the  measurement  errors, 
originally  Gaussian,  has  been  changed,  so  that  strictly  speaking  one  cannot  fit 
a  polynomial  to  determine  a\,  c/2,  and  c/3.  For  purposes  of  determining  the  first 
approximation  we  can  disregard  this  difficulty  and  set  the  errors  of  the  lny(fj) 
equal  to  one.  We  then  determine  the  quantities  a\,  c/2,  c/3 ,  and  from  them  the 
values  xi,  X2,  X3,  and  use  these  as  initial  values. 

The  logarithms  of  the  measured  values  y,-  are  shown  in  the  lower  part  of 
Fig.  9.5.  One  can  see  that  fit  range  for  the  parabola  must  be  restricted  to  points 
in  the  bell-shaped  region  of  the  curve,  since  the  points  in  the  tails  are  subject 
to  large  fluctuations.  ■ 

Example  9.5:  Fit  of  an  exponential  function 

In  studies  of  radioactivity,  for  example,  a  function  of  the  form 


yit)  =xiexp(— X2O 


(9.6.4) 


must  be  fitted  to  measured  values  y(-  (6 ) .  The  program  to  be  provided  here  by 
the  user  can  have  the  following  form. 

In  Fig.  9.6  the  result  of  a  fit  to  data  is  shown.  The  determination  of  the  first 
approximation  for  the  unknowns  can  again  be  obtained  by  fitting  a  straight 
line  (graphically  or  numerically)  to  the  logarithm  of  the  function  y(t). 


lny(t)  =  lnxi  —  X2 1 


Here  one  usually  uses  only  the  values  at  small  t,  since  these  points  have 
smaller  fluctuations.  ■ 
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Fig.  9.6  :  Measured  values  and  fitted  exponential  function  ( above );  logarithms  of  the  measured 
values  (below). 


Example  9.6:  Fitting  a  sum  of  exponential  functions 

A  radioactive  substance  often  consists  of  a  mixture  of  components  with  differ¬ 
ent  decay  times.  One  must  therefore  fit  a  sum  of  several  exponential  functions. 
We  will  consider  the  case  of  two  functions 

y(t)  =  x\  exp(— X2t)  +X3exp(— X4t)  .  (9.6.5) 

Figure  9.7  shows  the  result  of  fitting  a  sum  of  two  exponential  functions 
to  the  data.  A  first  approximation  can  be  determined  by  fitting  two  different 
lines  to  In  y,  (t, )  for  the  regions  of  smaller  and  larger  f; .  ■ 
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Fig.  9.7  :  Measured  values  with  the  fitted  sum  of  two  exponential  functions  ( above );  logarithm 
of  the  measured  values  (below). 


9.6.2  Marquardt  Iteration 

The  procedure  with  step-size  reduction  discussed  in  Sect.  9.6. 1  leads  to  a  min¬ 
imum  when  x  is  already  in  a  region  where  A /T  A '  is  positive  definite,  i.e.,  in  a 
region  around  the  minimum  where  the  function  Mix)  has  a  positive  curvature. 
(In  the  one-dimensional  case  of  Fig.  10.1,  this  is  the  region  between  the  two 
points  of  inflection  on  either  side  of  the  minimum.)  It  is  clear,  however,  that 
it  must  be  possible  to  extend  the  region  of  convergence  to  the  region  between 
the  two  maxima  surrounding  the  minimum.  This  possibility  is  offered  by  the 
Marquardt  procedure,  which  is  presented  in  Sect.  10.15  in  a  somewhat  more 
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general  form.  The  class  LSqMar  treats  the  nonlinear  case  of  least  squares; 
the  user’s  task  is  much  the  same  as  in  the  case  of  LsqNon.  Readers  who 
want  to  became  familiar  with  this  class  should  first  study  Sect.  10.15  and  the 
introductory  sections  of  Chap.  10  and  finally  Sect.  A.  17 


Example  9.7:  Fitting  a  sum  of  two  Gaussian  functions  and  a  polynomial 

In  practice  it  is  usually  not  so  easy  to  find  the  amplitude,  position,  and  width  of 
signals  as  was  shown  in  Example  9.4.  One  usually  has  more  than  one  signal, 
lying  on  a  background  that  varies  slowly  with  the  controlled  variable  t.  Since 
in  general  this  background  is  not  well  known,  it  is  approximated  by  a  line  or  a 
second-order  polynomial.  We  will  consider  the  sum  of  such  a  polynomial  and 
two  Gaussian  distributions,  i.e.,  a  function  of  nine  unknown  parameters, 

r\ 

h(x,  t )  =  X\  +X2t  +X3t 

+  V4exp{— (vs  —  f)2/2x|} +X7exp{— (vs  —  f)2/2v|}  .  (9.6.6) 


The  derivatives  (9.5.5)  are 


~aj  i 

-aj2 

-ap 

~ajA 

~ai  5 


dtp 

dxi 

djp 

dx2 

dx3 

djp 

dX4 

djp 

dx5 


=  1 


=  t 


=  exp{— (x5  —  tj)2/2xl)  , 

=  2x4  exp {—(x5  -  tj)2/2xl}}  *5 

2x2 


dh 


j 


9x6 


=  x4exp{— (X5  —  tj)  /2x6} 


2  (tj  -X5) 


5 


If  the  numerical  differentiation  fails,  the  derivatives  must  be  computed  with  a 
specially  written  version  of  the  routine AuxDri.  For  the  numerical  example 
shown  in  Fig.  9.8,  however,  this  is  not  necessary.  The  user  must  supply,  of 
course,  an  extension  of  DataUserFunction  to  compute  (9.6.6). 

Figure  9.8  shows  the  result  of  fitting  to  a  total  of  50  measurements. 
It  is  perhaps  interesting  to  look  at  some  of  the  intermediate  steps  in  Fig.  9.9. 
As  a  first  approximation  the  parameters  of  the  polynomial  were  set  to  zero 
(xi  =  X2  =  X3  =  0).  For  both  of  the  clearly  visible  signals,  rough  estimates  of 
the  amplitude,  position,  and  width  were  used  for  the  initial  values.  The  func¬ 
tion  (9.6.6)  with  the  parameters  of  this  first  approximation  is  shown  in  the  first 
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>  t 


Fig.  9.8  :  Measured  values  and  fitted  sum  of  a  second-order  polynomial  and  two  Gaussian 
functions. 


frame(STEP  0).  In  the  following  frames  one  sees  the  change  of  the  function 
after  2, 4,  6,  and  8  steps,  and  finally  at  convergence  of  the  class  LsqMar  after 
a  total  of  9  steps.  ■ 


9.7  Properties  of  the  Least-Squares  Solution:  x2-Test 

Up  to  now  the  method  of  least  squares  has  merely  been  an  application  of 
the  maximum-likelihood  method  to  a  linear  problem.  The  prescription  of 
least  squares  (9.2.15)  was  obtained  directly  from  the  maximization  of  the 
likelihood  function  (9.2.14).  In  order  to  be  able  to  specify  this  likelihood 
function  in  the  first  place,  it  was  necessary  to  know  the  distribution  of  the 
measurement  errors.  We  assumed  a  normal  distribution.  But  also  when  there 
is  no  exact  knowledge  of  the  error  distribution,  one  can  still  apply  the  rela¬ 
tion  (9.2.15)  and  with  it  the  remaining  formulas  of  the  last  sections.  Such  a 
procedure  seems  to  lack  a  theoretical  justification.  The  Gauss-Markov  the¬ 
orem,  however,  states  that  in  this  case  as  well,  the  method  of  least  squares 
provides  results  with  desirable  properties.  Before  entering  into  this,  let  us  list 
once  more  the  properties  of  the  maximum-likelihood  solution. 
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Fig.  9.9  :  Successive  approximations  of  the  fit  function  to  the  measurements. 


(a)  The  solution  x  is  asymptotically  unbiased,  i.e.,  for  very  large  samples, 

E(xi)=Xi  ,  i  =  l,2,  ...,r  . 

(b)  It  is  a  minimum  variance  estimator,  i.e., 

cr2(xj)  =  E{(xj  -x/)2}  =  min  . 

(c)  The  quantity  (9.2.16) 

M  =  eJGys 

follows  a  /  2 -distribution  with  n—r  degrees  of  freedom. 
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The  properties  (a)  and  (b)  are  familiar  from  Chap.  ?.  We  will  demonstrate 
the  validity  of  (c)  for  the  simple  case  of  direct  measurements  (r  =  1),  for 
which  the  matrix  Gy  is  diagonal: 


( w 

0  > 

Gy  = 

1/a2 

• 

• 

• 

1  0 

]/an  ) 

The  quantity  M  then  simply  becomes  a  sum  of  squares, 

n 

M  =  J2£Vaj  •  (9.7.1) 

7  =  1 

Since  each  sj  comes  from  a  normally  distributed  population  with  mean  of  zero 
and  variance  a2,  the  quantities  sj /a?  are  described  by  a  standard  Gaussian 

distribution.  Thus  the  sum  of  squares  follows  a  ^-distribution  with  n  —  1 
degrees  of  freedom. 

If  the  distribution  of  the  errors  ej  is  not  known,  then  the  least-squares 
solution  has  the  following  properties: 

(a)  The  solution  is  unbiased. 

(b)  Of  all  solutions  x*  that  are  unbiased  estimators  for  x  and  are  linear 
combinations  of  the  measurements  y,  the  least-squares  solution  has  the 
smallest  variance.  (This  is  the  Gauss-Markov  theorem.) 

(c)  The  expectation  value  of 


M  =  eTGye 


is 

E(M )  =  n  —  r 

(This  is  exactly  the  expectation  value  of  a  /  2  variable  for  n  —  r  degrees 
of  freedom.) 

The  quantity  M  is  often  simply  called  y2,  although  it  does  not  necessarily 
follow  a  x  2-distribution.  Together  with  the  matrices  Cj  and  Cfj  it  provides 
a  convenient  measure  of  the  quality  of  a  fit  with  the  least- squares  method. 
If  the  value  of  M  obtained  from  the  measurements  comes  out  much  larger 
than  n—r ,  then  the  assumptions  of  the  calculation  must  be  carefully  checked. 
The  result  should  not  simply  be  accepted  without  critical  examination. 

The  number  f  —  n  —  r  is  called  the  number  of  degrees  of  freedom  of 
a  fit  or  the  number  of  equations  of  constraint  of  the  fit.  It  is  clear  from  the 
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beginning  (see  Appendix  A),  that  the  problem  of  least  squares  can  only  be 
solved  for  /  >  0.  Only  for  /  >  0,  however,  is  the  quantity  M  meaningfully 
defined  and  usable  for  testing  the  quality  of  a  fit. 

If  it  is  known  that  the  errors  are  normally  distributed,  then  a  /  '-test  can 
be  done  in  conjunction  with  the  fit.  One  rejects  the  result  of  the  fit  if 

M  =  eTGye>  x\-a(n~r)  ,  (9.7.2) 

i.e.,  if  the  quantity  M  exceeds  the  quantile  of  the  /^distribution  with  n  —  r 
degrees  of  freedom  at  a  significance  level  of  a.  A  larger  value  of  M  can  be 
caused  be  the  following  reasons  (and  also  by  an  error  of  the  first  kind): 

(a)  The  assumed  functional  dependence  f(x,  r|)  =  0  between  the  measured 
values  r/  and  the  unknown  parameters  x  is  false.  Either  the  function 
f(x,  rj)  is  completely  wrong,  or  some  of  the  parameters  taken  to  be 
known  are  not  correct. 

(b)  The  function  f(x,  rj)  is  correct,  but  the  series  expansion  with  only  one 
term  is  not  a  sufficiently  good  approximation  in  the  region  of  parameter 
space  covered  in  the  computation. 

(c)  The  initial  approximation  xo  is  too  far  from  the  true  value  x.  Better 
values  for  xo  could  lead  to  more  acceptable  values  of  M .  This  point  is 
clearly  related  to  (b). 

(d)  The  covariance  matrix  of  the  measurements  Cy,  which  is  often  only 
based  on  rough  estimates  or  assumptions,  is  not  correct. 

These  four  points  must  be  carefully  taken  into  consideration  if  the  method 
of  least  squares  is  to  be  applied  successfully.  In  many  cases  the  least-squares 
computation  is  repeated  many  times  for  different  data  sets.  One  can  then  look 
at  the  empirical  frequency  distribution  of  the  quantity  M  and  compare  it  to  a 
X  2 -distribution  with  the  corresponding  number  of  degrees  of  freedom.  Such 
a  comparison  is  particularly  useful  for  a  good  estimate  of  Cy.  At  the  start  of  a 
new  experiment,  the  apparatus  that  provides  the  measured  values  y  is  usually 
checked  by  measuring  known  quantities.  In  this  way  the  parameters  x  for  sev¬ 
eral  data  sets  are  determined.  The  covariance  matrix  of  the  measurements  can 
then  be  determined  such  that  the  distribution  (i.e.,  the  histogram)  of  M  agrees 
as  well  as  possible  with  a  /  -distribution.  This  investigation  is  particularly 
illustrative  when  one  considers  the  distribution  of  F{M)  instead  of  that  of 
M,  where  F  is  the  distribution  function  (6.6.11)  of  the  /^distribution.  If  M 
follows  a  /  ^distribution,  then  F(M)  follows  a  uniform  distribution. 
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9.8  Confidence  Regions  and  Asymmetric  Errors 
in  the  Nonlinear  Case 

We  begin  this  section  with  a  brief  review  of  the  meaning  of  the  covariance 
matrix  C*  of  the  unknown  parameters  for  the  linear  least-squares  case.  The 
probability  that  the  true  value  of  the  unknowns  is  given  by  x  is  given  by  a 
normal  distribution  of  the  form  (5.10.1), 

0(x)  =  kexp{— ^(x  —  x)T#(x  —  x)}  ,  (9.8.1) 

where  B  =  C~.  The  exponent  (multiplied  by  —2), 

JC 

g(x)  =  (x-x)tB(x-x)  ,  (9.8.2) 

for  g  =  1  =  const  describes  the  covariance  ellipsoid  (cf.  Sect.  5.10).  For  other 
values  of  g(x)  =  const  one  obtains  confidence  ellipsoids  for  a  given  proba¬ 
bility  W  [see  (5.10.19)].  In  this  way  confidence  ellipsoids  can  easily  be  con¬ 
structed  that  have  the  point  x  =  x  at  the  center  and  which  contain  the  true 
value  x  of  the  unknown  with  probability  W  =  0.95. 

We  can  now  find  a  relation  between  the  confidence  ellipsoid  and  the  mini¬ 
mum  function.  For  this  we  use  the  expression  (9.2. 19),  which  is  exactly  true  in 
the  linear  case,  and  we  compute  the  difference  between  the  minimum  function 
M{x)  at  a  point  x  and  the  minimum  function  M(x)  at  the  point  x  where  it  is  a 
minimum, 

M(x)  —  M(x)  =  (x  —  x)T  ATGyA(x  —  x)  .  (9.8.3) 

According  to  (9.2.27),  one  has 

B  =  C-1  =  G~x  =  ATGyA  .  (9.8.4) 

So  the  difference  (9.8.3)  is  exactly  the  function  introduced  above,  g(x). 

Thus  the  covariance  ellipsoid  is  in  fact  the  hypersurface  in  the  space  of 
the  r  variables  x\, ,  xr ,  where  the  function  Mix)  has  the  value  Mix)  = 
M(x)  +  1.  Correspondingly,  the  confidence  ellipsoid  with  probability  W  is 
the  hypersurface  for  which 


M{x)  —  M{x)  +  g  ,  (9.8.5) 

where  the  constant  g  is  the  quantile  of  the  y 2 -distribution  with  f  =  n  —  r 
degrees  of  freedom  according  to  (5.10.21), 

g  =  Xw(f )  •  (9-8.6) 

In  the  nonlinear  case  of  least  squares,  our  considerations  remain  approx¬ 
imately  valid.  The  approximation  becomes  better  when  the  nonlinear  devi¬ 
ations  of  the  expression  (9.5.3)  are  less  in  the  region  where  the  unknown 
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parameters  are  varied.  The  deviations  are  not  only  small  when  the  deriva¬ 
tives  are  almost  constant,  i.e.,  the  function  is  almost  linear,  but  also  when  the 
variation  of  the  unknown  parameters  is  small.  A  measure  of  the  variation  of 
an  unknown  is,  however,  its  error.  Because  of  error  propagation,  the  errors 
of  the  unknowns  are  small  when  the  original  errors  of  the  measurements  are 
small.  In  the  nonlinear  case,  therefore,  the  covariance  matrix  retains  the  same 
interpretation  as  in  the  linear  case  as  long  as  the  measurement  errors  are  small. 

If  the  measurement  errors  are  large  we  retain  the  interpretation  (9.8.5), 
i.e.,  we  determine  the  hypersurface  with  (9.8.5)  and  state  that  the  true  value 
x  lies  with  probability  W  within  the  confidence  region  around  the  point.  This 
region  is,  however,  no  longer  an  ellipsoid. 

For  only  one  parameter  x,  the  curve  M  =  M  (x )  can  easily  be  plotted. 
The  confidence  region  is  simply  a  segment  of  the  x  axis.  For  two  parame¬ 
ters  x\,  x 2  the  boundary  of  the  confidence  region  is  a  curve  in  the  (xi ,  X2) 
plane.  It  is  the  contour  line  (9.8.5)  of  the  function  M  =  Mix).  The  graphical 
representation  of  these  contours  is  performed  by  the  method  DatanGra- 
phics.  drawContour,  cf.  Appendix  F.5.  For  more  then  two  parameters, 
a  graphical  representation  is  only  possible  in  the  form  of  slices  of  the  con¬ 
fidence  region  in  the  (x;- ,  xj )  plane  in  the  (x  1 ,  X2 . . xr )  space  through  the 

point  x  =  (xi ,  X2, . . . ,  xr),  where  x, ,  xj  can  be  any  possible  pair  of  variables. 


Example  9.8:  The  influence  of  large  measurement  errors  on  the  confidence 
region  of  the  parameters  for  fitting  an  exponential  function 

The  results  of  fitting  an  exponential  function  as  in  Example  9.5  to  data  points 
with  errors  of  different  sizes  are  shown  in  Fig.  9.10.  The  two  fitted  parame¬ 
ters  xi  and  X2  and  their  errors  and  correlation  coefficient  are  also  shown  in 
the  figure.  The  quantities  Ax  1,  Ax 2,  and  p  were  computed  directly  from  the 
elements  of  the  covariance  matrix  Cy.  As  expected,  these  errors  increase  for 
increasing  measurement  errors. 

The  covariance  ellipses  for  this  fit  are  shown  as  thin  lines  in  the  various 
frames  of  Fig.  9.1 1.  The  small  circle  in  the  center  indicates  the  solution  x.  In 
addition  the  vertical  bars  through  the  point  x  mark  the  regions  x\  ±  Ax  1  and 
X2  ±  Ax 2-  As  a  thicker  line  a  contour  is  indicated  which  surrounds  the  con¬ 
fidence  region  M(x)  =  M(x)  +  1.  We  see  that  for  small  measurement  errors 
the  confidence  region  and  covariance  ellipse  are  practically  identical,  while 
for  large  measurement  errors  they  clearly  differ  from  each  other.  ■ 


Computing  and  plotting  the  confidence  regions  requires,  to  the  extent 
that  they  differ  from  the  confidence  ellipsoids,  considerable  effort.  It  is  there¬ 
fore  important  to  be  able  to  decide  whether  a  clear  difference  between  the 
two  exists.  For  clarity  we  will  stay  with  the  case  of  two  variables  and  con¬ 
sider  again  Example  9.8  and  in  particular  Fig.  9.11.  The  errors  Axi  —  a,  = 
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x,=  9.67,  x2=  .98,  Axp  .40  Ax2=  .06,  q=  .80 


x.,=  8.56,  x2=  .90,  Axp  1.56  Ax2=  .23,  g  =  .80 


x.,=  9.32,  x2=  .96,  A x ,=  .80  Ax2=  .11,  g=  .80 


x\=  6.39,  x2=  .62,  Axp  2.75  Ax2=  .47,  g=  .83 


Fig.  9.10:  Fit  of  an  exponential  function  77  =  x\  exp(— X2t)  to  measurements  with  errors  of 
different  sizes. 


x/Cf  (7, T)  obtained  from  the  covariance  matrix  C*  have  the  following  prop¬ 
erty.  The  lines  .vy  =  i,-  ±  Axi  are  tangent  to  the  covariance  ellipse.  If  the  co- 
variance  ellipse  must  be  replaced  by  a  confidence  region  of  less  regular  form, 
then  we  can  nevertheless  find  the  horizontal  and  vertical  tangents,  and  at  those 
places  in  Fig.  9.1 1  one  has 

Xi  =  Xi  +  Axi+  ,  Xi=Xi-  Axi-  .  (9.8.7) 

Because  of  the  loss  of  symmetry,  the  errors  Ax+  and  Ax-  are  in  general 
different.  One  speaks  of  asymmetric  errors.  For  r  variables  one  has  tangent 
hypersurfaces  of  dimension  r  —  1  instead  of  tangent  lines. 

We  now  give  a  procedure  to  compute  the  asymmetric  errors  Axj±.  One  is 
only  interested  in  asymmetric  confidence  regions  when  the  asymmetric  errors 
are  significantly  different  from  the  symmetric  ones.  The  values  x i±  =  Xj  ± 
Axi±  have  the  property  that  the  minimum  of  the  function  M  for  a  fixed  value 
of  Xi  =  Xj±  has  the  value 

min{M(x;  xt  =  Xj±)}  =  M(x)  +  g  (9.8.8) 

with  g  =  1.  For  other  values  of  g  one  obtains  corresponding  asymmetric  con¬ 
fidence  boundaries.  If  we  bring  all  the  terms  in  (9.8.8)  to  the  left-hand  side, 
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Fig.9.11:  Results  of  the  fit  from  Fig.  9.10  in  the  parameter  space  x\,  X2-  Solution  x  (small 
circle ),  symmetric  errors  (error  bars),  covariance  ellipse,  asymmetric  error  limits  (horizon¬ 
tal  and  vertical  lines),  confidence  region  corresponding  to  the  probability  of  the  covariance 
ellipse  (dark  contour). 


min{M(x;xi=Xi±)}-M(x)-g  =  0  ,  (9.8.9) 

then  we  see  the  problem  to  be  solved  is  a  combination  of  minimization  and 
zero-finding.  We  find  the  zeros  using  an  iterative  procedure  as  in  Sect.  E.2. 
The  zero,  i.e.,  the  point  Xj  fulfilling  (9.8.9),  is  first  bracketed  by  the  values 
Xsmall  and  .v'big,  corresponding  to  negative  and  positive  values  of  (9.8.9).  Next, 
this  interval  is  reduced  by  successively  dividing  it  in  half  until  the  expres¬ 
sion  (9.8.9)  differs  from  zero  by  no  more  than  g/ 100.  The  minimum  in  (9.8.9) 
is  found  with  LsqNon  or  LsqMar.  The  classes,  by  which  the  asymmetric 
errors  are  determined,  are  LsqAsn,  if  LsqNon  is  used,  and  lsqAsm  for 
the  use  of  LsqMar.  The  asymmetric  errors  for  Example  9.8  are  shown  in 
Fig.9.11. 


9.9  Constrained  Measurements 

We  now  return  to  the  case  of  Sect.  9.1,  where  the  quantities  of  interest  were 
directly  measured.  The  N  measurements  are,  however,  no  longer  completely 
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independent,  but  rather  are  related  to  each  other  by  q  equations  of  constraint. 
One  could  measure,  for  example,  the  three  angles  of  a  triangle.  The  equation 
of  constraint  says  that  their  sum  is  equal  to  180°.  We  again  ask  for  the  best 
estimates  rjj  for  the  quantities  qj .  The  measurements  give  instead  the  values 

yj  =  Vj  +  sj  ,  y'  =  l,2,  ...,n  .  (9.9.1) 

As  above  let  us  assume  a  normal  distribution  about  zero  for  the  measurement 
errors  Sj : 

E(sj)  =  0  ,  E(ej)  —  crj  . 

The  q  equations  of  constraint  have  the  form 

fk (t|)  =  0  ,  k  =  \,2, ...  ,q  .  (9.9.2) 

Let  us  first  consider  the  simple  case  of  linear  equations  of  constraint.  The 
Eqs.  (9.9.2)  are  then  of  the  form 

b\0  + burn  +  bum - \-b\nqn  -  o  , 

^20  +  ^2171  +b2im-\ - \-b2nVn  =  0  ,  (9.9.3) 

bqO  +  bq\T}\  +  bq2^2  H  bqnVn  —  0  , 

or  in  matrix  notation, 

5r|  +b0  =  0  .  (9.9.4) 

9.9.1  The  Method  of  Elements 

Although  not  well  suited  for  automatic  processing  of  data,  an  illustrative 
procedure  is  the  method  of  elements.  We  can  use  the  q  equations  (9.9.3) 
to  eliminate  q  of  the  n  quantities  q.  The  remaining  n  —  q  quantities  cq  (i  = 
1,2 ,  ...,n  —  q)  are  called  elements.  They  can  be  chosen  arbitrarily  from  the 
original  //  or  they  can  be  a  linear  combination  of  them.  We  can  then  express 
the  full  vector  q  as  a  set  of  linear  combinations  of  these  elements, 

hj  =  fjO  +  fj\<X\  +  fj2<*2-\ - fj.n-qOin-q  ,  7  =  1,2,...,  71  ,  (9.9.5) 

or 

q  =  Fa  +  fo  .  (9.9.6) 

Equation  (9.9.6)  is  of  the  same  type  as  (9.2.2).  The  solution  must  thus  be  of 
the  form  of  (9.2.26),  i.e., 

&  =  (FTGyF)-lFTGy(  y-f0) 


(9.9.7) 
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describes  the  estimation  of  the  elements  a  according  to  the  method  of  least 
squares.  The  corresponding  covariance  matrix  is 

G~1  =  (FTGyF)~1  ,  (9.9.8) 

cf.  (9.2.27).  The  improved  measurements  are  obtained  by  substituting  (9.9.7) 
into  (9.9.6): 

i)  =  Fa  +  fo  =  F(FTGyF)-1FTGy(y-fo)  +  fo  .  (9.9.9) 

By  error  propagation  the  covariance  matrix  is  found  to  be 

G21  =  F(FTGyF)~l  Ft  =  FG^lFT  .  (9.9.10) 


Example  9.9:  Constraint  between  the  angles  of  a  triangle 

Suppose  measurements  of  the  angles  of  a  triangle  have  yielded  the  values 
y\  =  89°,  y2  =  31°,  y3  =  61°,  i.e., 

/  89 

y=  3i 

\  62 

The  linear  equation  of  constraint  is 

r)i  +r]2  +  r]3  —  180 


It  can  be  written  as 

Fri  +  bo  =  0 

with 


B  =  (1,1,1)  ,  b0  =  &o  =  -180  . 

As  elements  we  choose  rj\  and  772 .  The  system  (9.9.5)  then  becomes 


or 


i.e., 


?73  =  180  —  oi\  —  0L2 


F  — 
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We  assume  a  measurement  error  for  the  angle  of  1°,  i.e., 


/  i  °  °  \ 

cy  =  I  0  1  0=7  ,  Gy  =  c;l  =  i 

V  0  0  1  / 


Using  these  in  (9.9.7)  gives 


/  2  1  \_1  /207\  _  1  /  2  -1  \  /207\ 
V  1  2  )  \149/  “3  V  -1  2  )  1,149; 

/88I\ 

boy 

Using  (9.9.9)  one  finally  obtains 


rj  =  Fa  +  fo  = 


This  result  was  clearly  to  be  expected.  The  “excess”  in  the  measured  sum  of 
angles  of  2°  is  subtracted  equally  from  the  three  measurements.  This  would 
not  have  been  the  case,  however,  if  the  individual  measurements  had  errors  of 
different  magnitudes.  The  reader  can  easily  repeat  the  computation  for  such 
a  case.  The  residual  errors  of  the  improved  measurements  can  be  determined 
by  application  of  (9.9.10), 


The  residual  error  of  each  angle  is  thus  equal  to  ^/2/3  ~  0.82.  ■ 
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At  this  point  we  want  to  make  a  general  statement  on  measurements  rel¬ 
ated  by  equations  of  constraint.  Although  in  the  statistical  methods  used  so 
far  we  have  found  no  way  of  dealing  with  systematic  errors,  equations  of 
constraint  offer  such  a  possibility  in  many  cases.  If,  for  example,  the  sum 
of  angles  in  many  measurements  is  observed  to  be  greater  than  180°  more 
frequently  than  less,  then  one  can  conclude  that  the  measurement  apparatus 
has  a  systematic  error. 


9.9.2  The  Method  of  Lagrange  Multipliers 


Instead  of  elements,  one  more  frequently  uses  the  method  of  Lagrange  multi¬ 
pliers.  Although  both  methods  clearly  give  the  same  results,  the  latter  has  the 
advantage  that  all  unknowns  are  treated  in  the  same  way  and  thus  the  user  is 
spared  having  to  choose  elements.  The  method  of  Lagrange  multipliers  is  a 
well-known  procedure  in  differential  calculus  for  determination  of  extrema  in 
problems  with  additional  constraints. 

We  begin  again  from  the  linear  system  of  equations  of  constraint  (9.9.4) 

B\\  +  b0  =  0 


and  we  recall  that  the  measured  quantities  are  the  sum  of  the  true  value  rj  and 
the  measurement  error  e, 

y  =  *l  +  e  • 


Thus  one  has 

5y-5e  +  b0  =  0  .  (9.9.11) 


Since  y  is  known  from  the  measurement,  and  bo  and  B  are  also  constructed 
from  known  quantities,  we  can  construct  a  column  vector  with  q  elements, 


c  =  5y  +  b0  , 


(9.9.12) 


which  does  not  contain  any  unknowns.  Thus  (9.9.11)  can  be  written  in  the 
form 

c-Be  =  0  .  (9.9.13) 

We  now  introduce  an  additional  column  vector  with  q  elements,  whose  ele¬ 
ments,  not  yet  known,  are  the  Lagrange  multipliers , 


V  Lc,  J 


(9.9.14) 


Using  this  we  extend  the  original  minimum  function  (9.2.16) 
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M  =  eTGye 


to 


L  =  eTGys  +  2/zt(c  —  Be) 


The  function  L  is  called  the  Lagrange  function.  The  requirement 


M  =  min 


(9.9.15) 


with  the  constraint 

c  — 5e  =  0 

is  then  fulfilled  when  the  total  differential  of  the  Lagrange  function  vanishes, 
i.e.,  when 

dL  =  2eTGy  de  —  2p,T5de  =  0  . 

This  is  equivalent  to 

eTGy  —  iitB  =  0  .  (9.9.16) 

The  system  (9.9.16)  consists  of  n  equations  containing  a  total  of  n+q 
unknowns,  £\,  £2,  ...,£«  and  /i \ ,  /x 2,  . . .,  jiq •  In  addition  we  have  the  q  equa¬ 
tions  of  constraint  (9.9.13).  We  transpose  (9.9.16)  and  obtain 

Gye  =  BT\i  ,  e  =  G~1BT\x,  .  (9.9.17) 

By  substitution  into  (9.9.13)  we  obtain 

c-BGy1BT\i  =  0  , 

which  can  easily  be  solved  for  |x: 

p,  =  (BG~lBTrxc  .  (9.9.18) 

With  (9.9.17)  we  thus  have  the  least-squares  estimators  of  the  measurement 
errors, 

e  =  Gy1BT(BGylBT)~1  c  .  (9.9.19) 

The  estimators  of  the  unknowns  are  then  given  by  (9.9.1), 

h  =  y  -  e  =  y  -Gy1BT(BG;lBTr1c  .  (9.9.20) 

With  the  abbreviation 

Gb  =  {BGylBJ)-1 

this  becomes 

fj  =  y  —  Gy1  BtGbc  .  (9.9.21) 

The  covariance  matrices  of  (I  and  rj  are  easily  obtained  by  applying  error 
propagation  to  the  linear  system  of  equations  (9.9.18)  and  (9.9.19), 

GT1  =  (BGylBTrl  =  GB  , 

GT1  =  Gf1-Gf1BTGBBGf1 


(9.9.22) 

(9.9.23) 
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Example  9.10:  Application  of  the  method  of  Lagrange  multipliers  to 
Example  9.9 

We  apply  the  method  of  Lagrange  multipliers  to  the  problem  of  Example  9.9. 
We  then  have 

/  89 

C=5y  +  bo  =  (1,1,1)  31  |  —  180  =  182  —  180  =  2  , 

\  62 

and  in  addition, 


GB  =  (BGyBTr 1 


(1,1,1)/ I  1 


“I  -1 


=  3->  =  I 


and 


GyB^I 


We  can  now  compute  (9.9.21), 


1 


1 


1 


1  =  1 


1 


-I  1  l-  = 


88/  \ 

30i 

V  ^ly  ) 


The  covariance  matrices  are  then 


G71  =  - 

n 


1 

3  ’ 


1 


G21  =  I -I 

n 


1 

/-- 

3 


1 


i  I30.U)/ 


We  now  generalize  the  method  of  Lagrange  multipliers  for  the  case  of 
nonlinear  equations  of  constraint  having  the  general  form  (9.9.2),  i.e., 

fkG 0  =  0  >  k=l,2,  ...,q  . 

These  equations  can  be  expanded  in  a  series  about  ti0, 


/*00  =  /*()  + 


94 

dr/i 


(’ll  ~  rj  10) - b 


rio 


dfk_ 

dr]n 


(Vn-Vno)  ■  (9.9.24) 


% 
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Here  the  %,  are  the  first  approximations  for  the  true  values  q.  Using  the 
definitions 


ck  —  fk(j 1o)  » 


h  =  ( m  -  mo) 


/  b\\  b\2  •••  bln  ^ 

bn  bn  ■■■  t>2n 

y  bq\  bq2  *  *  *  bqn  J 

(ci\ 

C2 


V  / 
l  h  \ 


V  8,1  / 


we  can  write  (9.9.24)  in  the  form 


B  8  +  c  =  0 


(9.9.25) 


Except  for  a  sign  this  relation  corresponds  to  (9.9.13).  The  solution  6 
therefore  can  be  read  off  (9.9.19), 

8  =  -Gy1BT(BGy1BTy1  c  .  (9.9.26) 

As  first  approximations  h]q  we  use  the  measured  values  y, 


vo  =  y 


(9.9.27) 


and  obtain 

ij  =  il0  +  8  .  (9.9.28) 

For  linear  constraint  equations  this  already  is  the  solution. 

If  the  equations  are  nonlinear  an  iteration  is  performed.  The  prescription 
for  step  i  of  the  iteration  is  described  for  the  general  case  at  the  end  of  the 
next  section.  If,  in  the  formulas  given  there,  all  terms  containing  the  matrix  A 
are  set  to  zero,  one  obtains  the  iteration  procedure  for  the  case  of  constraint 
measurements. 

For  each  step  i  one  computes 

Mi  =  ej  GySi  ,  ei=y-ni  .  (9.9.29) 

The  procedure  is  terminated  as  convergent,  if  a  further  step  leads  to  no  appre¬ 
ciable  reduction  of  Mj.  We  call  the  result  rj. 
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The  covariance  matrix  is  still  given  by  (9.9.23),  i.e., 

GT1  =  Gy1  —  G~1BtGbBG~1  ,  (9.9.30) 

if  the  elements  of  B  are  computed  using  at  rj.  The  best  estimates  of  the  mea¬ 
surement  errors  are  computed  using 

e  =  y-rj  .  (9.9.31) 

With  them  the  minimum  function  is  found  to  be 

M  =  eTGye  .  (9.9.32) 

This  quantity  again  can  be  used  for  ax2  test  with  q  degrees  of  freedom. 

Although  the  method  of  Lagrange  multipliers  is  mathematically  elegant, 
in  programs  provided  here  we  use  the  method  of  orthogonal  transformations 
(see  Sect.  9.12). 
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After  the  preparation  of  the  previous  sections  we  can  now  take  up  the  general 
case  of  fitting  with  the  method  of  least  squares. 

We  first  recall  the  notation.  The  r  unknown  parameters  are  placed  in  a 
vector  x.  The  quantities  to  be  measured  form  an  n -vector  q.  The  values  y 
actually  measured  differ  from  q  by  the  errors  e.  We  will  assume  a  normal 
distribution  for  the  individual  errors  £j  ( j  —  1,2,...,  n),  i.e.,  a  normal  distri¬ 
bution  for  the  n  variables  sj  with  the  null  vector  as  the  vector  of  expectation 
values,  and  a  covariance  matrix  Cy  =  G~l .  The  vectors  x  and  q  are  related  by 
m  functions 

/fc(x,q)  =  /*(x,y-e)  =0  ,  k=l,2,...,m  .  (9.10.1) 

We  will  further  assume  that  we  have  already  obtained  in  some  way  a  first 
approximation  for  the  unknowns  xo.  As  a  first  approximation  for  q  we  use 
q0  =  y  as  in  Sect.  9.9.  Finally  we  require  that  the  functions  fk  can  be  approxi¬ 
mated  by  linear  functions  in  the  range  of  variability  of  our  problem,  i.e.,  in  the 
region  around  (xo,  qo),  which  is  given  by  the  differences  x  —  xo  and  q  —  q0. 
We  can  then  write 


fk (x,  q)  =  /fc(xo,q0) 
3  fk 


+ 


dx\ 


(jci  —  xio)  H - h 


xo,r)o 


im  -qio)4 - b 

xo.-no 


f  in  ~  Vno) 


xo,% 


(9.10.2) 
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With  the  abbreviations 


Ukl  ~ 


dfk_ 

dxe 


xo,% 


bu  = 


94 

dru 


Ck  —  fk  (X0 1  ’ 


A  = 


B  = 


c  — 


^21  bn 


Cl 


(  al\  a\2  •••  Cl\r  \ 

an  CI22  •  ■  •  air 

y  1  d m2  *  *  *  ^mr  / 

/  /?n  /212  •••  b\fi  \ 


•  •  • 


&2. 


y  ^ftii  ^fti2  *  *  *  bmn  J 

Cl  \ 


y  Cm  J 

^  —  x  —  x0  ,  8  =  ri-q0 

the  system  of  equations  (9.10.2)  can  be  written  as  follows: 

A£  +  £8  +  c  =  0  . 


,  (9.10.3) 


,  (9.10.4) 


(9.10.5) 


(9.10.6) 


(9.10.7) 


The  Lagrange  function  is 


L  =  8TGy  8  +  2|xT(Ai;  +  58  +  c) 


(9.10.8) 


Here  |x  is  an  m-vector  of  the  Lagrange  multipliers.  We  require  that  the  total 
derivative  of  (9. 10.8)  with  respect  to  8  vanishes.  This  is  equivalent  to  requiring 

Gyh  +  BJ\i  =  0 

or 

8  =  —G~lBT\i  .  (9.10.9) 


Substitution  into  (9.10.7)  gives 

A%-BG~xBt\ r  +  c  =  0 
or 

p,  =  Gs(Aij  +c)  , 

where 

Gb  =  (BG-lBTrl  . 

With  (9.10.9)  we  can  now  write 

8  =  —Gy1  BtGb(A%  +c) 


(9.10.10) 

(9.10.11) 

(9.10.12) 

(9.10.13) 
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Since  the  Lagrange  function  Lisa  minimum  also  with  respect  to  ij ,  the  total 
derivative  of  (9.10.8)  with  respect  to  |  must  also  vanish,  i.e., 

2p,TA  =  0  . 

By  transposing  and  substituting  (9.10.1 1)  one  obtains 

2ATGs(Aij  +  c)  =  0 
or 

$  =  -(AtGbA)-1AtGbc  .  (9.10.14) 

Substituting  (9.10.14)  into  (9.10.13)  and  (9.10.11)  immediately  gives  the  es¬ 
timates  of  the  deviations  8  and  the  Lagrange  multipliers  p,, 

8  =  -Gy1BTGB(c-A(ATGBA)~1ATGBc)  ,  (9.10.15) 

p,  =  G5(c-A(AtG5A)_1AtGbc)  .  (9.10.16) 

The  estimates  for  the  parameters  x  and  for  the  improved  measurements  r)  are 

x  =  x0  +  $  ,  (9.10.17) 

fj  =  t)0  +  8  .  (9.10.18) 

From  (9.10.14),  (9.10.4),  and  (9.10.5)  we  obtain  for  the  matrix  of  derivatives 
of  the  elements  of  i*  with  respect  to  the  elements  of  y 

3c  x  IT  3c  x  IT 

TT  =  -(AtGbA)-1AtG5—  =  -( AtGbA)-1AtGbB  . 

3y  3y 

Using  error  propagation  one  obtains  the  covariance  matrix 

GT1  =  GT1  =  (AtGbA)-1  .  (9.10.19) 

Correspondingly  one  finds 

G r1  =  Gy 1  -  G~1BtGbBG~ 1  +  GylBTGBA(ATGBA)-1ATGBBGy1  . 

(9.10.20) 

One  can  show  that  under  the  assumed  conditions,  i.e.,  sufficient  linearity 
of  (9.10.2)  and  normally  distributed  measurement  errors,  the  minimum  func¬ 
tion  M,  which  can  also  be  written  in  the  form 

M  =  (Be)TGB(Be)  ,  e=y-rj  ,  (9.10.21) 

follows  a  x  2-distribution  with  m  —  r  degrees  of  freedom. 
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If  the  Eqs.  (9.10.1)  are  linear,  then  the  relations  (9.10.17)  to  (9.10.20) 
already  are  the  solutions.  In  nonlinear  cases  one  can  perform  an  iterative  pro¬ 
cedure,  which  we  now  discuss  in  detail  for  step  i  with  i  =  1,2,....  For  the 
functions  the  following  holds: 


-xu_ i)  +  --- 


t  —  Vi,i- 1)  H - 

n  Vn,i  — l) 

With  AV>,BV>,c">  we  denote  the  quantities  A,  B,  c,  evaluated  at  x,_i,  rii_\ . 
Furthermore  let 

$(,)  =  -  X(_i  ,  8(l)  =  i)'  -  i}i_x  . 

Then 

A('Y°  +  fl°V')  +  C(/)  =o  . 

We  now  denote  with 

s«>  =  Y,Sm 

i= t 

the  sum  of  the  contributions  of  all  previous  steps  to  improve  the  measurements 
and  find  for  the  difference  between  the  measurements  y  and  the  approximation 

Vi 

y  -  m  =  y  -  (J/O  +  s(i)  +  6(i))  =  -(s(i)  +  S(l))  , 

since  r/0  —  y.  The  first  term  of  the  Lagrangian  function  is 

(y-  Vi)TGy(y-  tji)  =  (s^  +  S^Gyis^  +  S^) 
and  the  full  Lagrangian  is 

L  =  (s(i)  +  8{i))TGy(s{i)+8{i))  +  2/i{i)T(A(i)i;(i)  +  B(i)8(i)+  c(f))  . 

We  can  now  proceed  as  above  and  get,  with  G^  =  (B^^G~l B^T)~l, 

$(i)  =  —  (A(i)TG^)A('))_1A(i)TGg)(c(i)  -  B(i)s(i))  . 


A(0(x,9)  = 


/*(x«-i,9/-i) 
dfk 


+ 


dxi 


+ 


3/fc 


9x 


(x 


(x 


x, -uVi-i 


+ 


+ 


3 h 

dr/i 
dr] 


(v 


in 


«/  Xf-l 
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and 


x  Gf  (c(i)  -  B(i)s(i))) 


For  every  step  i  we  compute 


The  procedure  is  terminated  as  convergent,  if  a  new  step  does  not  yield  an 
appreciable  further  reduction  of  M, .  The  results  of  the  iteration  are  denoted 
by  x,  rj.  The  corresponding  covariance  matrices  are  given  by  (9.10.19)  and 
(9.10.20),  if  the  matrices  A  and  B  are  evaluated  at  x,  rj.  It  is,  of  course,  pos¬ 
sible  for  the  iteration  process  to  diverge.  In  this  case  points  (a)-(d),  raised  at 
the  end  of  Sect.  9.7,  should  be  considered. 

In  the  following  section  we  describe  a  different  way  to  determine  the 
solutions  x,  rj.  Also  for  that  procedure  the  formulas  (9.10.19),  (9.10.20), 
and  (9.10.21)  for  the  computation  of  the  covariance  matrices  GJ1 ,  G’21  and 
the  minimum  function  M  remain  valid,  if  the  matrices  A  and  B  are  evaluated 
at  the  position  of  the  solution. 

9.11  Algorithm  for  the  General  Case  of  Least  Squares 

In  the  Java  class  LSqGen,  treating  the  general  case  of  least  squares,  we  do  not 
use  the  method  of  Lagrange  multipliers  but  rather  the  procedure  of  Sect.  A.  1 8, 
which  is  based  on  orthogonal  transformations. 

At  every  step  of  the  iteration  we  must  determine  the  r  -vector  i*  and  the 
n -vector  8.  We  combine  both  into  an  (r  +  /?  ) -vector  u, 


(9.11.1) 


The  m  x  r  matrix  A  and  the  m  xn  matrix  B  are  also  combined  into  an  m  x 
(r  +  n)  matrix  E , 


E  —  (A,  B) 


(9.11.2) 


The  vector  containing  the  solutions  u  must  satisfy  the  constraint  (9.10.7),  i.e., 


Eu  —  d  ,  d  =  — c 


(9.11.3) 


We  now  consider  the  minimum  function  in  the  /  th  iterative  step.  It  de¬ 
pends  on 


i  —  1 


8  =  8, 


(9.11.4) 
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where 


l  —  l 

S  =  I> 

i= l 


(9.11.5) 


is  the  result  of  all  of  the  previous  steps  that  changed  y.  One  then  has 
M  =  (r\  —  y)TGv(r)  —  y)  =  (8  +  s)TGv(8  +  s)  =  min  . 


(9.11.6) 


We  now  extend  the  n  xn  matrix  Gy  to  an  (r  +  n)  x  (r  +  n)  matrix 


G  = 


y 

0  0 
0  G 


y 


V 

}n 


(9.11.7) 


for  which  we  find  a  Cholesky  decomposition  according  to  Sect.  A.9, 


G  =  FlF 


(9.11.8) 


Then  (9.1 1.6)  becomes 


M  - 


(u  +  t)TG(u  +  t) 

=  (Fu  +  Ft)2  =  min 


or 


(Fu  —  b)2  =  min 


with 


t  = 


0\  }r 
}n 


b  —  —  Ft 


(9.11.9) 


(9.11.10) 


Now  one  must  merely  solve  the  problem  (9.1 1.9)  with  the  constraint  (9.1 1.3), 
e.g.,  with  the  procedure  of  Sect.  A.  18.  With  the  solution 


u  = 


or  rather  with  the  vectors  i;,  8  one  finds  improved  values  for  x  [cf.  (9.10.17)], 
q  [cf.  (9.10.18)],  and  for  s,  as  well  as  for  t  [cf.  (9.11.5)  and  (9.11.10)],  with 
which  an  additional  iterative  step  can  be  carried  out. 

The  procedure  can  be  regarded  as  having  converged  and  terminated 
when  the  minimum  function  (9.11.9)  in  two  successive  steps  only  changes 
by  an  insignificant  amount,  or  it  can  be  terminated  without  success  if  after 
a  given  number  of  steps  convergence  has  not  been  reached.  In  the  case  of 
convergence,  the  covariance  of  the  unknowns  can  be  computed  according 
to  (9.10.19).  The  calculation  of  the  covariance  matrix  of  the  “improved”  mea¬ 
surements  v]  according  to  (9.10.20)  is  possible  as  well.  It  is,  however,  rarely 
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of  interest.  Finally,  the  value  of  M  obtained  in  the  last  step  can  be  used  for  a 
X2-test  of  the  goodness-of-fit  with  m  —  r  degrees  of  freedom. 

All  these  operations  are  performed  by  the  class  LsqGen.  This  includes 
the  numerical  computation  of  the  derivatives  for  the  matrix  E.  The  user  only 
has  to  program  the  relation  (9.10.1),  which  depends  on  the  problem  at  hand. 
That  is  done  by  an  extension  of  the  abstract  class  DatanUserFunction. 

There  exist  example  programs  (Sect.  9.14)  for  the  following  examples  in  this 
chapter,  including  realizations  of  such  classes. 


Example  9.11:  Fitting  a  line  to  points  with  measurement  errors  in  both  the 
abscissa  and  ordinate 

Suppose  a  number  of  measured  points  (f,- ,  s, )  in  the  (t,  5)  plane  are  given.  Each 
point  has  measurement  errors  Ati,  Asi,  which  can,  in  general,  be  correlated. 
The  covariance  between  the  measurement  errors  Ati  and  Asv  is  c, .  We  identify 
ti  and  si  with  elements  of  the  n -vector  y  of  measured  quantities 


y  1  =t  1 


yi  =  s\ 


yn-\  — 1«/2  ^  y«  —  Sn/ 2 


The  covariance  matrix  is 

/  (Ah)2 
ci 
0 


c,= 


0 


c  1 

(Asi)2 

0 

0 


0 

0 

(Ati)2 

ci 


0 

0 

ci 

(As2y 


\ 


A  straight  line  in  the  (t,  s)  plane  is  described  by  the  equation  s  =  x  1  +X2 1. 
For  the  assumption  of  such  a  line  through  the  measured  points,  the  equations 
of  constraint  (9.10.1)  take  on  the  form 


/&(x,  T|)  =  mk-xi -x2r)2k-i  =0  ,  k  =  l,2,...,n/2  .  (9.11.11) 

Because  of  the  term  X2r)2k-\ ,  the  equations  are  not  linear.  The  derivatives  with 
respect  to  ?72£-i  depend  on  xi  and  those  with  respect  to  .v'2  depend  on  r]ik-\- 
The  results  of  fitting  to  four  measured  points  are  shown  in  Fig.  9.12.  The 
examples  in  the  two  individual  plots  differ  only  in  the  correlation  coefficient 
of  the  third  measurement,  pi  —  0.5  and  pi  —  —0.5.  One  can  see  a  noticeable 
effect  of  the  sign  of  pi  on  the  fit  result  and  in  particular  on  the  value  of  the 
minimum  function.  ■ 


Example  9.12:  Fixing  parameters 

In  Fig.  9.13  the  results  of  fitting  a  line  to  measured  points  with  errors  in  the 
abscissa  and  ordinate  are  shown,  where  in  each  plot  one  parameter  of  the  line 
was  held  fixed.  In  the  upper  plot  the  intercept  of  the  vertical  axis  x\  was  fixed, 
and  in  the  lower  plot  the  slope  xi  was  fixed.  ■ 
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x1=  0.067,  x2=  0.643,  M=  3.108,  NSTEP=  4 
A x  0.086,  Ax 2=  0.113,  q=— 0.835 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 

- >  t 


x1=  0.054,  x2=  0.620,  M=  1.668,  NSTEP=  4 
Ax-|=  0.087,  Ax 2=  0.114,  q=— 0.801 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 

- >  t 


Fig.  9.12:  Fitting  a  line  to  four  measured  points  in  the  ( t ,  s)  plane.  The  points  are  shown  with 
measurement  errors  (in  t  and  s)  and  covariance  ellipses.  The  individual  plots  show  the  results 
of  the  fit,  the  errors,  and  correlation  coefficients. 


9.12  Applying  the  Algorithm  for  the  General  Case 
to  Constrained  Measurements 

If  all  of  the  variables  x\, ...  ,xr  are  fixed,  then  there  are  no  more  unknowns  in 
the  least-squares  problem. 

In  the  equations  of  constraint  (9.10.2)  only  the  components  of  rj  are  vari¬ 
able.  Thus  all  terms  containing  the  matrix  A  vanish  from  the  formulas  of  the 
previous  sections.  As  previously,  however,  the  improved  measurements  ij  can 
be  computed.  The  quantity  M  can  in  addition  be  used  for  a  y  2 -test  with  m 
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x<=  0.200,  x2=  0.505,  M=  6.263,  NSTEP=  4 

Ax2=  0.062 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 

- >  t 


x,=  0.150,  x2=  0.500,  M=  5.000,  NSTEP=  2 

Ax1=  0.044 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 

- o  t 


Fig.9.13:  Fitting  a  line  to  the  same  points  as  in  the  first  plot  of  Fig.  9.12.  The  intercept  with 
the  vertical  axis  x\  was  held  constant  in  the  upper  plot,  and  below  the  slope  X2  was  held 
constant. 


degrees  of  freedom  for  how  well  the  equations  of  constraint  are  satisfied  by 
the  measurements.  Finally,  the  covariance  matrix  of  the  improved  measure¬ 
ments  Cfj  can  be  determined. 

Mathematically  -  as  well  as  when  using  the  program  LsqGen  -  it  does 
not  matter  whether  all  variables  are  fixed  (r  ^  0,  r'  —  0)  or  whether  from  the 
start  the  equations  of  constraint  do  not  depend  on  the  variables  x  (r  =  0).  In 
both  cases  LsqGen  gives  the  same  solution. 
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Example  9.13:  x2~test  of  the  description  of  measured  points  with  errors 
in  abscissa  and  ordinate  by  a  given  line 

We  use  the  same  measurements  as  used  already  in  Examples  9.11  and  9.12. 
The  results  of  the  analysis  of  these  measurements  with  LsqGen  with  fixed 
parameters  are  shown  in  Fig.  9.14.  For  the  upper  plot,  x\  and  X2  were  fixed 
to  the  values  obtained  from  fitting  with  two  adjustable  parameters  in  Exam¬ 
ple  9.1 1,  Fig.  9.12.  Clearly  we  also  obtain  the  same  value  of  M  as  previously. 
For  the  lower  plot,  arbitrarily  crude  estimates  (x  \  —  0,  X2  —  0.5)  were  used. 
They  give  a  significantly  higher  value  of  M.  This  value  would  lead  to  rejec¬ 
tion  of  the  hypothesis  with  a  confidence  level  of  99  %  that  the  data  points  are 
described  by  a  linear  relation  with  these  parameter  values  (Xo  99  —  13.28  for 
four  degrees  of  freedom). 

It  is  also  interesting  to  consider  the  improved  measurements  rj  and  their 
errors,  which  are  shown  in  Fig.  9.14.  The  improved  measurements  naturally 
lie  on  the  line.  The  measurement  errors  are  the  square  roots  of  the  diagonal 
elements  of  Cfj.  The  correlations  between  the  errors  of  a  measured  point  in 
s  and  t  are  obtained  from  the  corresponding  off-diagonal  elements  Cfj.  They 
are  exactly  equal  to  one.  The  covariance  ellipses  of  the  individual  improved 
measurements  collapse  to  line  segments  which  lie  on  the  line  given  by  x\,X2- 


9.13  Confidence  Region  and  Asymmetric  Errors 
in  the  General  Case 

The  results  obtained  in  Sect.  9.8  on  confidence  regions  and  asymmetric  errors 
are  also  valid  for  the  general  case.  We  therefore  limit  ourselves  to  stating  that 
asymmetric  errors  are  computed  by  the  class  LsqAsg  and  to  presenting  an 
example. 

Example  9.14:  Asymmetric  errors  and  confidence  region  for  fitting  a 
straight  line  to  measured  points  with  errors  in  the  abscissa  and 
ordinate 

Figure  9.15  shows  the  result  of  fitting  to  four  points  with  large  measurement 
errors.  From  Sect.  9.8  we  already  know  that  large  measurement  errors  can  lead 
to  asymmetric  errors  in  the  fitted  parameters.  In  fact,  we  see  highly  asymmet¬ 
ric  errors  and  large  differences  between  the  covariance  ellipse  and  the  corre¬ 
sponding  confidence  region.  ■ 
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x1=  0.067,  x2=  0.643,  M=  3.108,  NSTEP=  2 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 


- >  t 

x1=  0.000,  x2=  0.500,  M=16.417,  NSTEP=  2 


0  0.2  0.4  0.6  0.8  1  1.2  1.4 

- o  t 


Fig.  9.14:  Constrained  measurements.  The  hypothesis  was  tested  whether  the  true  values  of 
the  measured  points,  indicated  by  their  covariance  ellipses,  lie  on  the  line  s  =  x\  +X2 1.  With 
the  numerical  values  of  M,  a  x2-test  with  four  degrees  of  freedom  can  be  carried  out.  Shown 
as  well  are  the  improved  measurements,  which  lie  on  the  line,  and  their  errors. 


9.14  Java  Classes  and  Example  Programs 

Java  Classes  for  Least-Squares  Problems 

LsqPol  handles  the  fitting  of  a  polynomial  (Sect.  9.4.1). 

LsqLin  handles  the  linear  case  of  indirect  measurements  (Sect.  9.4.2). 
LsqNon  handles  the  nonlinear  case  of  indirect  measurements  (Sect.  9.6.1). 
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x1=  0.060,  x2=  1.157,  M=  0.872,  NSTEP=  6 
Ax^  0.520,  Ax2=  0.779,  q=— 0.881 


t>  t 


-1.5  -1  -0.5  0  0.5  1  1.5 

- o  x. 


Fig.9.15:  (Above)  Measured  points  with  covariance  ellipses  and  fitted  line.  (Below)  Result 
of  the  fit,  given  in  the  plane  spanned  by  the  parameters  x\,  X2-  Shown  are  the  fitted  param¬ 
eter  values  (circle),  symmetric  errors  (crossed  bars),  covariance  ellipse,  asymmetric  errors 
(horizontal  and  vertical  lines),  and  the  confidence  region  (dark  contour). 


LsqMar  handles  the  nonlinear  case  using  Marquardt’s  method  (Sect.  9.6.2). 

LsqAsn  yields  asymmetric  errors  or  confidence  limits  in  the  nonlinear  case 
(Sect.  9.8). 

LsqAsm  yields  asymmetric  errors  or  confidence  limits  using  Marquardt’s 
method. 

LsqGen  handles  the  general  case  of  least  squares  (Sect.  9.1 1). 
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LsqAsg  yields  asymmetric  errors  or  confidence  limits  in  the  general  case 
(Sect.  9.13). 

Example  Program  9.1:  The  class  ElLsq  demonstrates  the  use  of  LsqPol 

The  short  program  uses  the  data  of  Table  9.3  and  computes  vectors  x  of  coefficients 
and  their  covariance  matrix  Cx  for  r  —  1, 2,  3, 4.  Here  r  —  1  is  the  degree  of  the  poly¬ 
nomial,  i.e.,  r  is  the  number  of  elements  in  x.  The  results  are  presented  numerically 
Suggestions:  (a)  Modify  the  program  (by  modifying  a  single  statement)  so  that 
the  cases  r  —  1, 2, . . . ,  10  are  treated.  Which  peculiarity  do  you  expect  for  r  —  10? 
(b)  Instead  of  the  data  of  Table  9.3  use  different  data  determined  without  error  by  a 
polynomial,  e.g.,  y  —  t 2,  and  let  the  program  determine  the  parameters  of  the  polyno¬ 
mial  from  the  data.  Use  different  (although  obviously  incorrect)  sets  of  values  for  the 
errors  Ay  i ,  e.g.,  Ayt  —  <Jyj  for  one  run  through  the  program  and  Ayt  —  1  for  another 
run.  What  is  the  influence  of  the  choice  of  Ayt  on  the  coefficients  x,  on  the  minimum 
function,  and  on  the  covariance  matrix? 

Example  Program  9.2:  The  class  E2Lsq  demonstrates  the  use  of  LsqLin 

The  program  uses  the  data  of  Fig.  9.4  and  sets  up  the  matrix  A  and  the  vector  c;  these 
are  needed  for  the  fit  of  a  proportional  relation  y  —  x\t  to  the  data.  Next  the  fit  is 
performed  by  calling  LsqLin.  The  results  are  displayed  numerically. 

Suggestions:  (a)  Modify  the  program  so  that  in  addition  a  first  degree  polyno¬ 
mial  is  fitted.  Set  up  the  matrix  A  yourself,  i.e.,  do  not  use  LsqPol.  Compare  your 
result  with  Fig.  9.4.  (b)  Display  the  results  graphically  as  in  Fig.  9.4. 

Example  Program  9.3:  The  class  E3Lsq  demonstrates  the  use  of  LsqNon 

The  program  solves  the  following  problem.  First  20  pairs  of  values  (tt,  yt)  are  gener¬ 
ated.  The  values  tt  of  the  controlled  variables  are  1/21,  2/21,  . . . ,  20/21.  The  values 
yL  are  given  by 

yt  =  xiexp(-(t  -  x2)2 /2x%)  +  Si  . 

Here  £j  is  an  error  taken  from  a  normal  distribution  with  expectation  value  zero  and 
width  (7 i .  The  width  07  is  different  from  point  to  point.  It  is  taken  from  a  uniform 
distribution  with  the  limits  a/2  and  3a/2.  Thus,  the  yt  are  points  scattered  within  their 
errors  around  a  Gaussian  curve  defined  by  the  parameters  x  =  (x\ ,  X2,  X3).  The  widths 
of  the  error  distributions  are  known,  i.e.,  Ayt  =  at.  The  data  points  are  generated 
for  the  parameter  values  jci  =  1,  JC2  =  1.2,  JC3  =  0.4.  (They  are  identical  with  those 
shown  in  Fig.  9.5.)  The  program  now  successively  performs  four  different  fits  of  a 
Gaussian  curve  to  the  data.  In  a  first  step  all  three  parameters  in  the  fit  procedure 
are  considered  variable.  Then  successively  one,  two,  and  finally  all  parameters  are 
fixed.  Before  each  fit,  first  approximations  for  the  variables  are  set  which,  of  course, 
are  not  modified  during  the  fitting  procedure  for  the  fixed  parameters.  The  results  are 
presented  numerically. 

Suggestions:  (a)  Choose  different  first  approximations  for  the  non-fixed  param¬ 
eters  and  observe  the  influence  of  the  choice  on  the  results,  (b)  Obtain  first  approx¬ 
imations  by  computation,  e.g.,  by  the  procedure  described  in  Example  9.4,  in  which 
a  parabola  is  fitted  to  the  logarithms  of  the  data,  (c)  Modify  the  program  by  adding 
graphical  output  corresponding  to  Fig.  9.5. 
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Example  Program  9.4:  The  class  E4Lsq  demonstrates  the  use  of  LsqMar 

The  program  solves  the  problem  of  Example  9.7.  First  a  set  of  50  data  points  (tt,  yt)  is 
generated.  They  are  scattered  according  to  their  errors  around  a  curve  corresponding 
to  the  sum  of  a  second  degree  polynomial  and  two  Gaussians,  cf.  (9.6.6).  The  nine 
parameters  of  this  function  are  combined  to  form  the  vector  x.  The  measurement 
errors  Ayt  are  generated  by  the  procedure  described  in  Sect.  9.3.  The  data  points  are 
generated  for  predefined  values  of  x.  Next,  by  calling  LSQMAR  (with  significantly 
different  values  x  as  first  approximations)  solutions  x  are  obtained  by  a  fit  to  the  data 
points.  The  results  are  presented  numerically  and  also  in  graphical  form. 

Suggestions:  (a)  Fix  all  parameters  except  for  X5  and  xg,  which  determine  the 
mean  values  of  the  two  Gaussians  used  for  generating  the  data.  (In  the  program,  as 
customary  in  Java,  where  indexing  begins  with  0,  they  are  denoted  by  x[4]  and 
x  [7] .)  Allow  for  interactive  input  of  X5  and  xg.  For  different  input  values  of  (X5,  xg), 
e.g.,  (10,  30),  (19, 20),  (19, 15),  (10, 11),  try  to  determine  whether  you  can  still  sepa¬ 
rate  the  two  Gaussians  by  the  fit,  i.e.,  whether  you  obtain  significantly  different  values 
for  xs  and  xg,  considering  their  errors  Ax 5,  Ax g.  (b)  Repeat  (a),  but  for  smaller  mea¬ 
surement  errors.  Choose,  e.g.,  a  =  0.1  or  0.01  instead  of  a  —  0.4. 

Example  Program  9.5:  The  class  E5Lsq  demonstrates  the  use  of  LsqAsn 

The  program  solves  the  problem  of  Example  9.8.  First,  pairs  (tt,  yz)  of  data  are  pro¬ 
duced.  The  yt  are  scattered  according  to  their  measurement  errors  around  a  curve 
given  by  the  function  >7  (tj )  =  x\  exp(— xyt),  cf.  (9.6.4).  The  measurement  errors  Ayt 
are  generated  by  the  procedure  already  used  in  Sect.  9.3.  Starting  from  a  given  first 
approximation  for  the  parameters  xi,  values  x  of  these  parameters  are  fitted  to  the 
data  using  LsqNon.  Finally  the  asymmetric  errors  are  found  using  LsqAsn.  Two 
plots  are  produced.  One  shows  the  measurements  and  the  fitted  curve.  The  second 
displays,  in  the  (xi,X2)plane,  the  fitted  parameters  with  symmetric  and  asymmetric 
errors,  covariance  ellipse,  and  confidence  region. 

Example  Program  9.6:  The  class  E6Lsq  demonstrates  the  use  of  LsqGen 

The  program  fits  a  straight  line  to  points  with  measurement  errors  in  the  abscissa 
and  ordinate,  i.e.,  it  solves  the  problem  of  Example  9.11.  From  the  measured  values, 
their  errors  and  covariances  the  vector  y  and  the  covariance  matrix  Cy  are  set  up.  Det¬ 
ermination  of  the  two  parameters  x\  (ordinate  intercept)  and  X2  (slope)  is  done  with 
LsqGen.  For  this  the  function  (9.11.11)  is  needed.  It  is  implemented  in  the  method 
getValue  of  the  subclass  StraightLine  of  E6Lsq,  which  itself  is  an  exten¬ 
sion  of  the  abstract  class  DatanUserFunction.  The  first  approximations  of  x, 
and  X2  are  obtained  by  constmcting  a  straight  line  through  the  outermost  two  points. 
A  loop  extends  over  the  two  cases  of  Fig.  9.12.  The  results  are  shown  numerically 
and  graphically. 

Example  Program  9.7:  The  class  E7Lsq  demonstrates  the  use  of  LsqGen 
with  some  variables  fixed 

The  program  also  treats  Example  9.11.  Again  the  first  approximations  of  x\  and  X2 
are  obtained  by  constmcting  a  straight  line  through  the  outer  two  measured  points. 
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A  loop  extends  over  two  cases.  In  the  first  one  x\  is  fixed  at  x\  =  0.2,  in  the  second 

X2  is  fixed  at  aa  =  0.5.  The  results  correspond  to  Fig.  9.13. 

Example  Program  9.8:  The  class  E8Lsq  demonstrates  the  use  of  LsqGen 
with  all  variables  are  fixed  and  produces  a  graphical  representation  of 
improved  measurements 

The  problem  of  Example  9.13  is  solved.  The  results  are  those  of  Fig.  9.14. 

Example  Program  9.9:  The  class  E9Lsq  demonstrates  the  use  of  LsqAsg 
and  draws  the  confidence-region  contour  for  the  fitted  variables 

Here,  the  problem  of  Example  9.14  is  solved.  The  results  are  those  of  Fig.  9.15. 


10.  Function  Minimization 


Locating  extreme  values  (maxima  and  minima)  is  particularly  important  in 
data  analysis.  This  task  occurs  in  solving  the  least-squares  problem  in  the 
form  Mix,  y)  =  min  and  in  the  maximum  likelihood  problem  as  L  =  max.  By 
means  of  a  simple  change  of  sign,  the  latter  problem  can  also  be  treated  as 
locating  a  minimum.  We  always  speak  therefore  of  minimization. 


10.1  Overview:  Numerical  Accuracy 


We  consider  first  the  simple  quadratic  form 

M(x)  =  c  —  bx  4 — Ax2  .  (10.1.1) 

It  has  an  extremum  where  the  first  derivative  vanishes, 

d  M 

- =0  =  -b  +  Ax  ,  (10.1.2) 

d.v 


that  is,  at  the  value 


b 

A 


With  M(xm )  =  Mm.  Eq.  (10.1.1)  can  easily  be  put  into  the  form 


(10.1.3) 


1  9 

M{x)  —  Mm  —  -A(x  —  xm)2  .  (10.1.4) 

Although  the  function  whose  minimum  we  want  to  find  does  not  usually 
have  the  simple  form  of  (10.1.1),  it  can  nevertheless  be  approximated  by 
a  quadratic  form  in  the  region  of  the  minimum,  where  one  has  the  Taylor 
expansion  around  the  point  xq, 
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1  ? 

M(x)  —  M{xq)  —  b(x  —  xo)  H — A(x—x  o)2H -  ,  (10.1.5) 

where 

b  =  -M\x  0)  ,  A  —  M"(xq)  .  (10.1.6) 

In  this  approximation  the  minimum  is  given  by  the  point  where  the 
derivative  M'{x)  is  zero,  i.e., 


(10.1.7) 


This  holds  only  if  xo  is  sufficiently  close  to  the  minimum  so  that  terms  of  order 
higher  than  quadratic  in  (10.1.5)  can  be  neglected.  The  situation  is  depicted 
in  Fig.  10.1.  The  function  M(x)  has  a  minimum  at  xm,  maxima  at  xm,  and 
points  of  inflection  at  xs.  For  the  second  derivative  in  the  region  x  >  xm  one 
has  M"(x)  >  0  for  x  <  xs  and  M(x")  <  0  for  x  >  xs.  If  we  now  choose  xo  to 
be  in  the  region  xm  <  xo  <  xm,  then  the  first  derivative  Af'(x)  there  is  always 
positive.  Therefore  xmp  lies  closer  to  xm  only  if  xo  <  xs.  Clearly  the  point  xo 
is  not  in  general  chosen  arbitrarily,  but  rather  as  close  as  possible  to  where 
the  minimum  is  expected  to  be.  We  can  call  this  estimated  value  the  zeroth 
approximation  of  xm.  Various  strategies  are  available  to  obtain  successively 
better  approximations: 


Fig.  10.1:  The  function  M(x) 
has  a  minimum  at  xm,  maxima 
at  the  points  xm,  and  points  of 
inflection  at  xs. 


(i)  Use  of  the  function  and  its  first  and  second  derivatives  at  xo- 

One  computes  xmp  according  to  (10.1.7),  takes  xmp  as  a  first  approxi¬ 
mation,  i.e.,  xo  is  replaced  by  xmp,  and  obtains  by  repeated  application 
of  ( 1 0. 1 .7)  a  second  approximation.  The  procedure  is  repeated  until  two 
successive  approximations  differ  by  less  than  a  given  value  e.  From  the 
discussion  above  it  follows  that  this  procedure  does  not  converge  if  the 
zeroth  approximation  lies  outside  the  points  of  inflection  in  Fig.  10.1. 

(ii)  Use  of  the  function  and  its  first  derivative  at  xo. 

The  sign  of  the  derivative  M'(x o)  =  —  b  at  the  point  xo  determines 
the  direction  in  which  the  function  increases.  It  is  assumed  that  one 
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should  search  for  the  minimum  in  the  direction  in  which  the  function 
decreases.  The  quantity 

x\  =  xo  +  b  (10.1.8) 

is  computed,  i.e.,  one  replaces  the  second  derivative  by  the  value  unity. 

Instead  of  xq  one  now  uses  x\  as  the  approximation,  and  so  forth. 
Alternatively  one  can  also  use  instead  of  (10.1.8)  the  rule 

xi—xq  +  cb  ,  (10.1.9) 

where  c  is  an  arbitrary  positive  constant.  Both  rules  ensure  that  the  step 
from  xo  to  xi  proceeds  in  the  direction  of  the  minimum.  If  in  addition 
one  chooses  c  to  be  small,  then  the  step  is  small,  so  that  it  does  not  go 
beyond  (or  not  far  beyond)  the  minimum. 

(iii)  Use  of  the  function  at  various  points. 

The  procedure  (i)  can  be  carried  out  without  knowing  the  derivative  of 
the  function  if  the  function  itself  is  known  at  three  points.  One  can  then 
uniquely  fit  a  parabola  through  these  three  points  and  take  the  extreme 
value  as  an  approximation  of  the  minimum  of  the  function.  One  speaks 
of  locating  the  minimum  by  quadratic  interpolation.  It  is,  however,  by 
no  means  certain  that  the  extreme  value  of  the  parabola  is  a  minimum 
and  not  a  maximum.  As  in  procedure  (i)  it  is  therefore  important  that 
the  three  chosen  points  are  already  in  the  region  of  the  minimum  of  the 
function. 

(iv)  Successive  reduction  of  an  interval  containing  the  minimum. 

In  none  of  the  procedures  discussed  up  to  this  point  were  we  able  to 
guaranty  that  the  minimum  of  the  function  would  actually  be  found. 
The  minimum  can  be  found  with  certainty  provided  one  knows  an 
interval  xa  <  x  <  x/>  containing  the  minimum.  If  such  an  interval  is 
known,  one  can  locate  the  minimum  with  arbitrary  accuracy  by  succes¬ 
sively  subdividing  and  checking  in  which  subinterval  the  minimum  is 
located. 

In  Sects.  10.2-10.7  we  shall  examine  the  minimization  of  a  function  of 
only  one  variable.  In  Sect.  10.2  the  formula  for  a  parabola  determined  by 
three  points  will  be  given.  In  Sect.  10.3  it  is  shown  that  the  minimization 
of  a  function  of  one  variable  is  equivalent  to  the  minimization  of  a  function 
of  n  variables  on  a  line  in  an  n-di mensional  space.  Section  10.4  describes  a 
procedure  for  locating  an  interval  containing  the  minimum.  In  Sect.  10.5  a 
minimum  search  by  means  of  interval  division  is  described.  This  is  combined 
in  Sect.  10.5  with  the  procedure  of  quadratic  interpolation  in  a  way  that  the 
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interpolation  is  only  used  when  it  leads  quickly  to  the  minimum  of  the  func¬ 
tion.  If  this  is  not  the  case,  one  continues  with  interval  division.  In  this  way  we 
possess  a  procedure  with  the  certainty  of  interval  division  combined  with  -  as 
much  as  possible  -  the  speed  of  quadratic  interpolation.  The  same  procedure 
can  also  be  used  for  a  function  of  n  variables  if  the  search  for  the  minimum  is 
restricted  to  a  line  in  the  n -dimensional  space.  This  problem  is  addressed  in 
Sect.  10.7. 

Next  we  turn  to  the  task  of  searching  for  the  minimum  of  a  function  of  n 
variables.  We  begin  in  Sect.  10.8  with  the  particularly  elegant  simplex  method. 
This  is  followed  by  the  discussion  of  various  procedures  of  successive  mini¬ 
mization  along  fixed  directions  in  the  /; -dimensional  space.  These  directions 
can  simply  be  the  directions  of  the  coordinates  (Sect.  10.9),  or  for  a  function 
that  depends  only  quadratically  on  the  variables  they  can  be  chosen  such  that 
the  minimum  is  reached  in  at  most  n  steps  (Sects.  10.10  and  10.1 1). 

Finally  we  discuss  a  procedure  of  n -dimensional  minimization  which 
employs  aspects  of  methods  (i)  and  (ii)  of  the  one-dimensional  case.  If  x  is 
the  //-dimensional  vector  of  the  variables,  then  the  general  quadratic  form, 
i.e.,  the  generalization  of  (10.1.1)  to  n  variables,  is 


Mix) 


1  T 

c  —  b  •  x  T  -x  Ax 
2 


c  yJ  bk*k + ^  y  XkAfciXi 

k  k,l 


(10.1.10) 


Here  A  is  a  symmetric  matrix,  A^i  =  A^.  The  partial  derivative  with  respect 
to  Xi  is 


9  M 
dx; 


il*l 


T  'y  ^  %k  A  kt 


=  -bi  +  J^A 


iiXi 


(10.1.11) 


l 


Expressing  all  of  the  partial  derivatives  as  a  vector  V M  gives 


VM  =  -b  +  Ax  . 


(10.1.12) 


At  the  minimum  point  the  vector  of  derivatives  vanishes.  The  minimum  is 
therefore  located  at 


=  A_1b 


(10.1.13) 


in  analogy  to  (10.1.3). 
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Clearly  the  function  M(x)  does  not  in  general  have  the  simple  form 
of  (10.1.10).  We  can,  however,  expand  it  in  a  series  around  the  point  xo, 

M{x)  —  M (x0)  —  b(x  —  x0)  +  ^  (x  —  x0)T A  (x  —  x0)  H -  ,  (10.1.14) 

with  the  negative  gradient 


b  =  -VM(x  o) 


i.e., 


8M 


x=x0 


and  the  Hessian  matrix  of  second  derivatives 

3  2M 


dxjdxk 


X— x0 


(10.1.15) 


(10.1.16) 


The  series  (10.1.14)  is  the  starting  point  for  various  minimization  procedures: 


(i)  Minimization  in  the  direction  of  the  gradient. 

Starting  from  the  point  xo  one  searches  for  the  minimum  along  the 
direction  given  by  the  gradient  VM(x o)  and  calls  the  point  where  it 
is  found  xi.  Starting  from  xi  one  looks  for  the  minimum  along  the 
direction  \  Mix i),  and  so  forth.  We  will  discuss  this  procedure,  called 
minimization  in  the  direction  of  steepest  descent,  in  Sect.  10.12. 

(ii)  Step  of  given  size  in  the  gradient  direction. 

One  computes  in  analogy  to  (10.1.9) 

X!=xo  +  cb  ,  b  =  —  VM(xo)  ,  (10.1.17) 

with  a  given  positive  c.  That  is,  one  takes  a  step  in  the  direction  of 
steepest  descent  of  the  function,  without,  however,  searching  exactly 
for  the  minimum  in  this  direction.  Next  one  computes  the  gradient  at 
Xi,  steps  from  xi  in  the  direction  of  this  gradient,  etc.  In  Sect.  10.13  we 
will  combine  this  method  with  the  following  one. 

(iii)  Use  of  the  gradient  and  the  Hessian  matrix  at  X(j. 

If  we  truncate  (10.1.14)  after  the  quadratic  term,  we  obtain  a  function 
whose  minimum  is,  according  to  (10.1.13),  given  by 

xmp  =  xo  +  A-1b  .  (10.1.18) 

We  take  xi  =  xmp  as  the  first  approximation,  compute  for  this  point  the 
gradient  and  Hessian  matrix,  obtain  by  corresponding  use  of  (10.1.18) 
the  second  approximation,  and  so  forth.  This  procedure  is  discussed 
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in  Sect.  10.14.  It  converges  quickly  if  the  zeroth  approximation  xo  is 
sufficiently  close  to  the  minimum.  If  that  is  not  the  case,  however,  then 
it  gives  -  as  for  the  corresponding  one  dimensional  procedure  -  no 
reasonable  solution.  We  will  combine  it,  therefore,  in  Sect.  10.15  with 
method  (ii),  in  order  to  obtain,  when  possible,  the  speed  of  (iii),  but 
when  necessary,  the  certainty  of  (ii). 

In  Sects.  10.8  through  10.15  very  different  methods  for  solving  the  same 
problem,  the  minimization  of  a  function  of  n  variables,  will  be  discussed. 
In  Sect.  10.16  we  give  information  on  how  to  choose  one  of  the  methods 
appropriate  for  the  problem  in  question.  Section  10.17  is  dedicated  to  con¬ 
siderations  of  errors.  In  Sect.  10.18  several  examples  are  discussed  in  detail. 

Before  we  find  the  minimum  xm  of  a  function,  we  would  like  to  inquire 
briefly  about  the  numerical  accuracy  we  expect  for  xm.  The  minimum  is  after 
all  almost  always  determined  by  a  comparison  of  values  of  the  function  at 
points  close  in  x.  If  we  solve  (10.1.4)  for  (x  —  xm),  we  obtain 


2 [Mix)  -  M(xm)] 
A 


We  assume  that  A,  i.e.,  the  second  derivative  of  the  function  M,  is  of  order  of 
magnitude  unity  close  to  the  minimum.  (This  need  only  be  true  approximately. 
In  fact,  in  numerical  calculations  one  always  scales  all  of  the  quantities  such 


that  they  are  of  order  unity,  i.e.,  not  something  like  106  or  10  6.)  If  we  com¬ 


pute  the  function  M  with  the  precision  8  then  the  difference  Mix)  —  M(xm) 
is  also  known  at  best  with  a  precision  of  8,  i.e.,  two  function  values  can  not 
be  considered  as  being  significantly  different  if  they  differ  by  only  <5.  For  the 
corresponding  x  values  one  then  has 


(10.1.19) 


If  the  computer  has  n  binary  places  available  for  representing  the  man¬ 
tissa,  then  a  value  x  can  be  represented  with  the  precision  (4.2.7) 


Ax 


For  computing  a  value  x  one  chooses  therefore  a  relative  precision 


since  it  is  clearly  pointless  to  try  to  compute  a  value  with  a  higher  precision 
than  that  with  which  it  can  be  represented.  If  x  is  computed  iteratively,  i.e.,  one 
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computes  a  series  xo,xi, ...  of  approximations  for  x,  then  one  can  truncate 
this  series  as  soon  as 

\Xk  X/c—  1 

- <  s 

\Xk\ 


or 


\Xk-Xk- 1 


<  S\Xk\ 


(10.1.20) 


for  a  given  e.  With  this  prescription  we  will  have  difficulties,  however, 
if  Xk  =  0.  We  introduce,  therefore,  in  addition  to  s  a  constant  t  /  0  and 
extend  (10.1.20)  to 


I  Xk  Xk—  l 


<  s\xk\  ~T  t 


(10.1.21) 


The  last  task  remaining  is  to  choose  the  numerical  values  for  e  and  t.  If 
x  is  the  position  of  the  minimum,  then  by  (10.1.19)  a  value  for  s  must  be 
chosen  that  is  greater  than  or  equal  to  the  square  root  of  the  relative  precision 
for  the  representation  of  a  floating  point  number.  With  computations  using 
“double  precision”  in  Java  there  are  n  =  53  binary  places  available  for  the 
representation  of  the  mantissa.  Then  only  the  values 


s  >  2“n/2  fa  2- 10“8 


are  reasonable.  The  quantity  t  corresponds  to  an  absolute  precision.  Therefore 
it  can  be  chosen  to  be  considerably  smaller. 


10.2  Parabola  Through  Three  Points 

If  three  points  (xa.  ya ) ,  (xb,  yb),  (xc ,  yc)  of  a  function  are  known,  then  we  can 
determine  the  parabola 

y  =  ao  +  a\x  +  aix  (10.2.1) 

that  passes  through  these  points.  Instead  of  (10.2.1)  we  can  also  represent  the 
parabola  by 

y  =  co  +  c\(x—Xb)  +  C2(x—Xb)2  .  (10.2.2) 

This  relationship  is  naturally  valid  for  three  given  points,  i.e., 

yb  =  co 

(ya  -  yb)  =  ci(xa  - xb)  +  c2(xa  - xb)2  , 

(yc  - yb)  -  ci(xc - xb)  +  c2(xc  - xb)2  . 


and 
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From  this  we  obtain 


ci  =  C[(xc-xb)2(ya-yb)-(xa-Xb)2(yc-yb )]  , 
C2  =  C[-(xc  -  xb)(ya  -  yb)  +  (xa  -  xb)(yc  -  yb)] 


with 


C  = 


1 


(Xa  ~  Xb) (Xc  -  Xb)2  ~  (Xc  -Xh)(Xa  Xb ) 2 

and  for  the  extremum  of  the  parabola 

ci 


Xmp  —  Xb 


2c2 


(10.2.3) 

(10.2.4) 


(10.2.5) 


Fig.  10.2:  Parabola  through  three  points  ( small  circles)  and  its  extremum  ( large  circle).  In  the 
left  figure  there  is  a  minimum,  and  on  the  right  a  maximum. 

The  class  MinParab  performs  this  simple  calculation.  We  must  still 
determine  whether  the  extremum  of  the  parabola  is  a  minimum  or  a  maxi¬ 
mum  (cf.  Fig.  10.2).  One  has  a  minimum  if  the  second  derivative  of  (10.2.2) 
with  respect  to  x  —  xb  is  positive,  i.e.,  if  C2  >  0.  We  now  order  the  three  given 
points  such  that 

xa  <xb  <  xc 

and  find  that  then 

1/C  =  (xc  -  xb)(xa  -  Xb)(xc  -  Xa)  =  ~(xc  -  xb)(xb  -  Xa)(xc  -  Xa)  <  0  . 
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With  this  one  has  for  the  sign  of  C2 

signc2  =  sign[(*c  -  xh)(ya  -  yb)  +  ( xb  -  xa)(yc  ~  yb )]  • 

Both  expressions  (xc  —  xb)  and  (xb  —  xa)  are  positive.  Therefore  for  the  ex¬ 
tremum  to  be  a  minimum  it  is  sufficient  that 

ya>  yb  ,  yc>  yb  •  (10.2.6) 

The  condition  is  not  necessary,  but  has  the  advantage  of  greater  clarity.  In  the 
interval  xa  <  x  <  xc  one  clearly  has  a  minimum  if  there  is  a  point  (xb,  yb)  in 
the  interval  where  the  function  has  a  value  smaller  than  at  the  two  end  points. 
Clearly  this  statement  is  also  valid  if  the  function  is  not  a  parabola.  We  will 
make  use  of  this  fact  in  the  next  section. 

10.3  Function  of  n  Variables  on  a  Line 
in  an  n -Dimensional  Space 

Locating  the  minimum  of  the  function  M(x)  of  a  single  variable  x  in  an 
interval  of  the  x  axis  is  equivalent  to  locating  the  minimum  of  a  function  Mix) 
of  an  n -dimensional  vector  of  variables  x  =  (x\ ,  X2, . . . ,  xn)  with  respect  to  a 
given  line  in  the  n -dimensional  space.  If  xo  is  a  fixed  point  and  d  is  a  fixed 
vector,  then 

xo  +  ad  ,  —  oo  <  a  <  oo  ,  (10.3.1) 

describes  a  fixed  line  (see  Fig.  10.3),  and 

f  (a)  =  M  (xq  +  ad)  (10.3.2) 

is  the  value  of  the  function  at  the  point  a  on  this  line.  For  n  =  1 ,  xo  =  0,  d  =  1 
and  with  the  change  of  notation  a  =  x,  that  is,  fix)  =  Mix),  one  recovers  the 
original  problem. 

The  class  FunctionOnline  computes  the  value  (10.3.2);  it  makes  use 
of  an  extension  of  the  abstract  class  DatanUserFunction,  to  be  provided 
by  the  user,  which  defines  the  function  M(x).  In  Sects.  10.4  through  10.6  we 
consider  the  minimum  of  a  function  of  a  single  variable.  The  programs  also 
treat,  however,  the  case  of  a  minimum  of  a  function  of  n  variables  on  a  line  in 
the  n-dimensional  space. 

10.4  Bracketing  the  Minimum 

For  many  minimization  procedures  it  is  important  to  know  ahead  of  time  that 
the  minimum  xm  is  located  in  a  specific  interval, 
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x2  A 


Fig.  10.3:  The  line  given  by 
(10.3.1)  in  two  dimensions. 


(10.4.1) 


By  systematically  reducing  the  interval,  the  position  of  the  minimum  can  then 
be  further  constrained  until  finally  for  a  given  precision  s  one  has 


xa  —  x  r  <  e 


C 


(10.4.2) 


There  is,  in  fact,  a  minimum  in  the  interval  (10.4.1)  if  there  is  an  x  value  x/, 
such  that 


M(xb)<M(xa)  ,  M(xb)<M(xc)  ,  xa<xb<xc  .  (10.4.3) 

The  class  MinEnclose  attempts  to  bracket  the  minimum  of  a  function 
by  giving  values  xa,Xb,xc  with  the  property  (10.4.3).  The  program  is  based 
on  a  similar  subroutine  by  PRESS  et  al.  [12].  Starting  from  the  input  values 
xa,xb,  which,  if  necessary,  are  relabeled  so  that  yb  <  ya,  a  value  xc  =  Xb  + 
p{xb  —  Xa)  is  computed  that  presumably  lies  in  the  direction  of  decreasing 
function  value,  i.e.,  closer  to  the  minimum.  The  factor  p  in  our  program  is 
set  to  p  =  1.618034.  In  this  way  the  original  interval  (xa,x b)  is  enlarged  by 
the  ratio  of  the  golden  section  (cf.  Sect.  10.5).  The  goal  is  reached  if  yc  > 
yb-  If  this  is  not  the  case,  a  parabola  is  constructed  through  the  three  points 
(xa,  ya),  (Xb,  yb),  (xc,  yc),  whose  minimum  is  at  ,v'm. 

We  now  examine  the  point  (xm,  ym ) .  Here  one  must  distinguish  between 
various  cases: 

(a)  Xb  <xm<  xc: 

(al)  }’m  <  yc  ■  (xb,xm,xc)  is  the  desired  interval. 

(a2)  yb  <  }’m  ■  (xa,xb,xm)  is  the  desired  interval. 

(a3)  ym  >  yc  and  ym  <  yb-  There  is  no  minimum.  The  interval  will  be 
extended  further  to  the  right. 
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(b)  xc  <  xm  <  xenti  and  .\en(j  =  x/,  +  f(xc  —  Xf,)  and  /  =  10  in  our  program. 

(bl)  ym  >  yc  :  (x/,,  xc,  vm)  is  the  desired  interval. 

(b2)  ym  <  yc:  There  is  no  minimum.  The  interval  will  be  extended 
further  to  the  right. 

(c)  Vend  <  xm  :  As  a  new  interval  (x/;,  xc ,  xend)  is  used. 

(d)  xm  <  Xb  :  This  result  is  actually  impossible.  It  can,  however,  be  caused  by 

a  rounding  error.  The  interval  will  be  extended  further  to  the  right. 

If  the  goal  is  not  reached  in  the  current  step,  a  further  step  is  taken  with 
the  new  interval.  Figure  10.4  shows  an  example  of  the  individual  steps  carried 
out  until  the  bracketing  of  the  minimum  is  reached. 


Fig.  10.4:  Bracketing  of  a  minimum  with  three  points  a,  b,  c  according  to  (10.4.3).  The  initial 
values  are  shown  as  larger  circles,  and  the  results  of  the  individual  steps  are  shown  as  small 
circles. 


10.5  Minimum  Search  with  the  Golden  Section 

As  soon  as  the  minimum  has  been  enclosed  by  giving  three  points  xa,Xb,  xc 
with  the  property  (10.4.3),  the  bracketing  can  easily  be  tightened  further.  One 
chooses  a  point  v  inside  the  larger  of  the  two  subintervals  (xa,Xb)  and  (xb,xc). 
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If  the  value  of  the  function  at  x  is  smaller  than  at  x/,,  then  the  subinterval 
containing  x  is  taken  as  the  new  interval.  If  the  value  of  the  function  is  greater, 
then  x  is  taken  as  the  endpoint  of  the  new  interval. 

A  particularly  clever  division  of  the  intervals  is  possible  with  the  golden 
section.  Let  us  assume  (see  Fig.  10.5)  that 

i  1 

g  =  T  >  §>g  >  (10.5.1) 


is  the  length  of  the  subinterval  (xa,xb)  (to  be  determined  later)  measured 
in  units  of  the  length  of  the  full  interval  (xa,xc).  We  now  want  to  be  able 
to  divide  the  subinterval  (xa,Xb)  again  with  a  point  x  corresponding  to  a 
fraction  g. 


(10.5.2) 


In  addition,  the  points  x  and  x/,  should  be  situated  symmetrically  with  respect 
to  each  other  in  the  interval  (xa,xc),  i.e., 
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Fig.  10.5  :  The  golden  section. 


It  follows  that 


that  is, 


and 


X  —  L  —  l 
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X  +  £  l  ' 


V5-  1 
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■s/5  —  1 

g  =  ^ 0.618034 


(10.5.3) 
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(10.5.4) 
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As  shown  at  the  beginning  of  this  section  (for  the  case  shown  in  Fig.  10.5 
%b  —  xa  >  xc  —  Xb),  the  minimum,  which  was  originally  only  constrained  to 
be  in  the  interval  (xa,xc),  now  lies  either  in  the  interval  (xa,Xb)  or  in  the 
interval  (x,xc).  By  subdividing  by  the  golden  section  one  obtains  intervals  of 
equal  size. 


Fig.  10.6:  Stepwise  bracketing  of  a  minimum  by  subdividing  according  to  the  golden  section. 
The  original  interval  ( larger  circles )  is  reduced  with  every  step  ( small  circles ). 


Figure  10.6  shows  the  first  six  steps  of  an  example  of  minimization  with 
the  golden  section. 
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10.6  Minimum  Search  with  Quadratic  Interpolation 

From  the  example  shown  in  Fig.  10.6  one  sees  that  the  procedure  of  interval 
subdivision  is  quite  certain,  but  works  slowly.  We  now  combine  it,  therefore, 
with  quadratic  interpolation. 

The  class  MinCombined  is  based  on  a  program  developed  by  Brent 
[13],  who  first  combined  the  two  methods.  The  meanings  of  the  most  impor¬ 
tant  variable  names  in  the  program  are  as  follows,  a  and  b  denote  the  x  values 
xa  and  xt,  that  contain  the  minimum,  xm  is  their  mean  value,  m  is  the  point  at 
which  the  function  has  its  lowest  value  up  to  this  step,  W  is  the  point  with  the 
second  lowest,  and  V  is  the  point  with  the  third  lowest  value  of  the  function. 
U  is  the  point  where  the  function  was  last  computed.  One  begins  with  the 
two  initial  values,  xa  and  \[,,  that  contain  the  minimum  and  adds  a  point  x, 
that  divides  the  interval  (xa,Xb)  according  to  the  golden  section.  Then  in  each 
subsequent  iteration  parabolic  interpolation  is  attempted  as  in  Sect.  10.2.  The 
result  is  accepted  if  it  lies  in  the  interval  defined  by  the  last  step  and  if  in  this 
step  the  change  in  the  minimum  is  less  than  half  as  much  as  in  the  previous 
one.  By  this  condition  it  is  ensured  that  the  procedure  converges,  i.e.,  that  the 
steps  become  smaller  on  the  average,  although  a  temporary  increase  in  step 
size  is  tolerated.  If  both  conditions  are  not  fulfilled,  a  reduction  of  the  interval 
is  carried  out  according  to  the  golden  section. 

Numerical  questions  are  handled  in  a  particularly  careful  way  of  Brent. 
Starting  from  the  two  parameters  e  and  t,  which  define  the  relative  precision, 
and  the  current  value  x  for  the  position  of  the  minimum,  an  absolute  precision 
Ax  =  ex  +  t  -  tol  is  computed  according  to  (10.1.21).  The  iterations  are 
continued  until  the  half  of  the  interval  width  falls  below  the  value  tol  (i.e., 
until  the  distance  from  x  to  xm  is  not  greater  than  tol)  or  until  the  maximum 
number  of  steps  is  reached.  In  addition  the  function  is  not  computed  at  points 
that  are  separated  by  a  distance  less  than  tol,  since  such  function  values 
would  not  differ  significantly. 

The  first  six  steps  of  an  example  of  minimization  according  to  Brent 
are  shown  in  Fig.  10.7.  Steps  according  to  the  golden  section  are  marked  by 
GS,  and  those  from  quadratic  interpolation  with  QI.  The  comparison  with 
Fig.  10.6  shows  the  considerably  faster  convergence  achieved  by  quadratic 
interpolation. 


10.7  Minimization  Along  a  Direction  in  n  Dimensions 

The  class  MinDir  computes  the  minimum  of  a  function  of  of  n  variables 
along  the  line  defined  in  Sect.  10.3  by  xo  and  d.  It  first  uses  MinEnclose  to 
bracket  the  minimum  and  then  MinCombined  to  locate  it  exactly. 
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Fig.  10.7:  Stepwise  bracketing  of  a  minimum  with  the  combined  method  of  Brent.  The  ini¬ 
tial  interval  ( large  circles)  is  reduced  with  each  step  ( small  circles).  Steps  are  carried  out 
according  to  the  golden  section  (GS)  or  by  quadratic  interpolation  (QI). 


The  class  MinDir  is  the  essential  tool  used  to  realize  a  number  of  dif¬ 
ferent  strategies  to  find  a  minimum  in  n  dimensional  space,  which  we  discuss 
in  Sects.  10.9-10.15.  A  different  type  of  strategy  is  the  basis  of  the  simplex 
method  presented  in  the  next  section. 


10.8  Simplex  Minimization  in  n  Dimensions 

A  simple  and  very  elegant  (although  relatively  slow)  procedure  for  determin¬ 
ing  the  minimum  of  a  function  of  several  variables  is  the  simplex  method  by 
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Nelder  and  Mead  [14].  The  variables  x\,xi, . . .  ,xn  define  an  //-dimensional 
space.  A  simplex  is  defined  in  this  space  by  n  +  1  points  x,- , 

X/  =  ( A  ] / ,  X2i ,  •  •  •  ,  Xfii )  .  (10.8.1) 


A  simplex  in  two  dimensions  is  a  triangle  with  the  corner  points  xi ,  X2,  X3. 

We  use  \’i  to  designate  the  value  of  the  function  at  x,-  and  use  particular 
indices  for  labeling  special  points  x,- .  At  the  point  x//  the  value  of  the  function 
is  highest,  i.e.,  y//  >  y, ,  i  /  H.  At  the  point  x/,  it  is  higher  than  at  all  other 
points  except  x//  (y/7  >  y/,  i  /  //,  i  /  h)  and  at  the  point  xt:  it  is  lowest 
(yy  <  y/,  i  /  £.).  The  simplex  is  now  changed  step  by  step.  In  each  substep  one 
of  four  operations  takes  place,  these  being  reflection,  stretching,  flattening,  or 
contraction  of  the  simplex  (cf.  Fig.  10.8). 


Fig.  10.8  :  Transformation  of  a  simplex  ( triangle  with  thick  border)  into  a  new  form  ( triangle 
with  thin  border)  by  means  of  a  reflection  (a),  stretching  (b),  flattening  (c),  and  contrac¬ 
tion  (d). 


Using  x  to  designate  the  center-of-gravity  of  the  (hyper) surface  of  the 
simplex  opposite  to  x# , 


x 


I 


N-  1 


i^H 


the  reflection  of  x//  is 


xr  —  (1  +a)x  —  axn 


(10.8.2) 


(10.8.3) 


with  a  >  0  as  the  coefficient  of  reflection.  The  reflected  simplex  differs  from 
the  original  one  only  in  that  x/y  is  replaced  by  xr. 

A  stretching  of  the  simplex  consists  of  replacing  x#  by 


xe  =  yxr  +  (1  —  y)x 


(10.8.4) 
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with  the  coefficient  of  stretching  y .  One  therefore  chooses  (for  y  >  1)  a  point 
along  the  line  joining  x#  and  xr,  which  is  still  further  than  the  point  xr. 

In  a.  flattening,  x//  is  replaced  by  a  point  lying  on  the  line  joining  x// 
and  x.  This  point  is  located  between  the  two  points, 

=  fxH  +  (l-fi)x  .  (10.8.5) 

The  coefficient  of  flattening  ft  is  in  the  range  0  <  ft  <  1 . 

In  the  three  operations  discussed  up  to  now  only  one  point  of  the  simplex 
is  changed,  that  being  the  point  x#,  which  corresponds  to  the  highest  value 
of  the  function.  The  point  is  displaced  along  the  line  by  x#  and  x.  After  the 
displacement  it  either  lies  on  the  same  side  of  x  (flattening)  or  on  the  other 
side  of  x  (reflection)  or  even  far  beyond  x  (stretching).  In  contrast  to  these  op¬ 
erations,  in  a  contraction,  all  points  but  one  are  replaced.  The  point  xi  where 
the  function  has  its  lowest  value  is  retained.  All  remaining  points  are  moved 
to  the  midpoints  of  the  line  segments  joining  them  with  x^ 

Xc/  =  (x,-  +x^)/2  ,  i  .  (10.8.6) 

For  the  original  simplex  and  for  each  one  created  by  an  operation,  the 
points  x  and  xr  and  the  corresponding  function  values  y  and  yr  are  computed. 
The  next  operation  is  determined  as  follows: 


(a)  If  yr  <  yi,  a  stretching  is  attempted.  If  this  gives  ye  <  yi ,  then  the 

stretching  is  carried  out.  Otherwise  one  performs  a  reflection. 

(b)  For  yr  >  yi,  a  reflection  is  carried  out  if  yr  <  yn-  Otherwise  the  simplex  is 

unchanged.  In  each  case  this  is  followed  by  a  flattening.  If  one  obtains 
as  a  result  of  the  flattening  a  point  xa  for  which  the  function  value 
is  not  less  than  both  y//  and  y,  the  flattening  is  rejected,  and  instead  a 
contraction  is  carried  out. 


After  every  step  we  examine  the  quantity 


\}’h  —  ye  \ 

r  = - 

I  yH  I  +  I  ye 


(10.8.7) 


If  this  falls  below  a  given  value,  the  procedure  is  terminated  and  we  regard  xi 
as  the  point  where  the  function  has  its  minimum. 

The  class  MinSim  determines  the  minimum  of  a  function  of  n  variables 
by  the  simplex  method.  The  program  is  illustrated  in  the  example  of  Fig.  10.9. 
The  triangle  with  the  dark  border  is  the  initial  simplex.  The  sequence  of 
triangles  with  thin  borders  starting  from  it  correspond  to  the  individual  trans¬ 
formations.  One  clearly  recognizes  as  the  first  steps:  stretching,  stretching, 
reflection,  reflection,  flattening  ....  The  simplex  first  finds  its  way  into  the 
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Fig.  10.9:  Determining  the  minimum  of  a  function  of  two  variables  with  the  simplex  method. 
The  function  is  shown  by  contour  lines  on  which  the  function  is  constant.  The  function  is 
highest  on  the  outermost  contour.  The  minimum  is  at  the  position  of  the  small  circle  within 
the  innermost  contour.  Each  simplex  is  a  triangle.  The  initial  simplex  is  marked  by  thicker 
lines. 


“valley”  of  the  function  and  then  runs  along  the  bottom  of  the  valley  towards 
the  minimum.  It  is  reshaped  in  the  process  so  that  it  is  longest  in  the  direction 
in  which  it  is  progressing,  and  in  this  way  it  can  also  pass  through  narrow 
valleys.  It  is  worth  remarking  that  this  method,  which  is  obtained  by  consid¬ 
ering  the  problem  in  two  and  three  dimensions,  also  works  in  n  dimensions 
and  also  in  this  case  possesses  a  certain  descriptiveness. 

10.9  Minimization  Along  the  Coordinate  Directions 

Some  of  the  methods  for  searching  for  a  minimum  in  an  n -dimensional  space 
are  based  on  the  following  principle.  Starting  from  a  point  xo  one  searches  for 
the  minimum  along  a  given  direction  in  the  space.  Next  one  minimizes  from 
there  along  another  direction  and  finds  a  new  minimum,  etc.  Within  this  gen¬ 
eral  framework  various  strategies  can  be  developed  to  choose  the  individual 
directions. 


10.10  Conjugate  Directions 
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The  simplest  strategy  consists  in  the  choice  of  the  coordinate  directions  in 
the  space  of  the  n  variables  x, .  We  can  label  the  corresponding  basis  vectors 
with  ei,  e2, . . . ,  e„,  and  they  are  then  chosen  in  order  as  directions.  After  e„ 
one  begins  again  with  ei, e2, . . ..  A  partial  sequence  of  minimizations  along 
all  coordinate  directions  starting  from  xo  gives  the  point  xi .  After  a  new  partial 
sequence  one  obtains  X2,  etc. 

The  procedure  is  ended  successfully  when  for  the  values  of  the  function 
M(xn)  and  M(xn_i)  for  two  successive  steps  one  has 

M(x„_i)  -M(xn)  <  s\M(xn)\  +t  ,  (10.9.1) 

i.e.,  a  condition  corresponding  to  (10.1.21)  for  given  values  of  s  and  t. 
We  compare  here,  however,  the  value  of  the  function  M  and  not  the  indepen¬ 
dent  variables  x.  Otherwise  we  would  have  to  compute  the  distance  between 
two  points  in  an  n -dimensional  space. 

Figure  10.10  shows  the  minimization  of  the  same  function  as  in  Fig.  10.9 
with  the  coordinate-direction  method.  After  the  first  comparatively  large  step, 
which  leads  into  the  “valley”  of  the  function,  the  following  steps  are  quite 
small.  The  individual  directions  are  naturally  perpendicular  to  each  other.  The 
“best”  point  at  each  step  moves  along  a  staircase-like  path  along  the  floor  of 
the  valley  towards  the  minimum. 

10.10  Conjugate  Directions 

The  slow  convergence  seen  in  Fig.  10.10  clearly  stems  from  the  fact  that  when 
minimizing  along  one  direction,  one  loses  the  result  of  the  minimization  with 
respect  to  another  direction.  We  will  now  try  to  choose  the  directions  in  such 
a  way  that  this  does  not  happen.  For  this  we  suppose  for  simplicity  that  the 
function  is  a  quadratic  form  (10.1.10).  Its  gradient  at  the  point  x  is  then  given 
by  (10.1.12), 

VM  =  — b  +  Ax  .  (10.10.1) 

The  change  of  the  gradient  in  moving  by  Ax  is 

A(VM)  =  VM(x  +  Ax)  —  VM(x)  =  Ax  +  AAx  —  Ax  =  AAx  .  (10.10.2) 

This  expression  is  a  vector  that  gives  the  direction  of  change  of  the  gradient 
when  the  argument  is  moved  in  the  direction  Ax. 

If  a  minimization  has  been  carried  out  along  a  direction  p,  then  the 
reduction  in  M  with  respect  to  p  is  retained  when  one  moves  in  a  direction  q 
such  that  the  gradient  only  changes  perpendicular  to  p,  i.e.,  when  one  has 

p  •  (Aq)  —  pTAq  =  0 


(10.10.3) 
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Fig.  10.10:  Minimization  along  the  coordinate  directions.  The  starting  point  is  shown  by  the 
larger  circle ,  and  the  end  by  the  smaller  circle.  The  line  shows  the  results  of  the  individual 
steps. 


The  vectors  p  and  q  are  said  to  be  conjugate  to  each  other  with  respect  to  the 
positive-definite  matrix  A.  If  one  has  n  variables,  i.e.,  if  A  is  an  n  x  n  matrix, 
then  one  can  find  in  general  n  conjugate  linearly  independent  vectors. 

Powell  [15]  has  given  a  method  to  find  a  set  of  conjugate  directions 
for  a  function  described  by  a  quadratic  form.  For  this  one  chooses  as  a  first 
set  of  directions  n  linearly  independent  unit  vectors  p;,  e.g.,  the  coordinate 
directions  p(  =  e,- .  Starting  from  a  point  xo  one  finds  successively  the  minima 
in  the  directions  p, .  The  results  can  be  labeled  by 


a  1  —  aiPi  +«2P2  5 - f^nPn  5 

^2  —  “2P2  5 - f^wPn  • 


Here  &\  is  the  vector  representing  all  n  substeps,  a2  contains  all  of  the  steps 
except  the  first  one,  and  a„  is  just  the  last  substep.  The  sum  of  the  n  substeps 
leads  then  from  the  point  xq  to 


xi  =x0  +  ai . 
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The  direction  ai  describes  the  average  direction  of  the  first  n  substeps.  There¬ 
fore  we  carry  out  a  step  in  the  direction  ai,  call  the  result  again  xo,  determine 
a  new  set  of  directions 


Qi  —  P2  , 
Q2  =  P3  > 


5 


then  call  these  q(  again  p,  and  proceed  as  above.  As  was  shown  by  Powell 
[15],  the  directions  after  n  steps,  i.e.,  n(n  +  1)  individual  minimizations,  are 
individually  conjugate  to  each  other  if  the  function  is  a  quadratic  form. 


10.11  Minimization  Along  Chosen  Directions 

The  procedure  given  at  the  end  of  the  last  section  contains,  however,  the 
danger  that  the  directions  pl5 . . . ,  p„  can  become  almost  linearly  dependent, 
because  at  every  step  P|  is  rejected  in  favor  of  ai/|ai|,  and  these  directions 
need  not  be  very  different  from  step  to  step.  Powell  has  therefore  suggested 
not  to  replace  the  direction  P]  by  ai/|ai|,  but  rather  to  replace  the  direc¬ 
tion  pmax,  along  which  the  greatest  reduction  of  the  function  took  place. 
This  sounds  at  first  paradoxical,  since  what  is  clearly  the  best  direction  is 
replaced  by  another.  But  since  out  of  all  the  directions  pmax  gave  the  largest 
contribution  towards  reducing  the  function,  ai/|ai  |  has  a  significant  compo¬ 
nent  in  the  direction  pmax.  By  retaining  these  two  similar  directions  one  would 
increase  the  danger  of  a  linear  dependence. 

In  some  cases,  however,  after  completing  a  step  we  will  retain  the  old 
directions  without  change.  We  denote  by  xo  the  point  before  carrying  out  the 
step  in  progress,  by  xi  the  point  obtained  by  this  step,  by 

xe  =  xi  +  (xi  -x0)  =  2xi  -x0  (10.11.1) 

an  extrapolated  point  that  lies  in  the  new  direction  xi  —  xo  from  xo  but  is  still 
further  than  xi,  and  by  Mo,  M\,  and  Me  the  corresponding  function  values.  If 

Me  >  Mq  ,  (10.11.2) 

then  the  function  no  longer  decreases  significantly  in  the  direction  (xo  —  xi). 
We  then  remain  with  the  previous  directions. 

Further  we  denote  by  AM  the  greatest  change  in  M  along  a  direction  in 
the  step  in  progress  and  compute  the  quantity 
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T  =  2(M0-2Mi  +  Me)(M0-Mi  -  AM)2  -  (M0  -  Ms)2  AM  .  (10.11.3) 
If 

T  >0  , 

then  we  retain  the  old  directions.  This  requirement  is  satisfied  if  either  the  first 
or  second  factor  in  the  first  term  of  (10.1 1.3)  becomes  large.  The  first  factor 
(Mo  —  2Mi  +  Me)  is  proportional  to  a  second  derivative  of  the  function  M.  If 
it  is  large  (compared  to  the  first  derivative  in  meaningful  units),  then  we  are 
already  close  to  the  minimum.  The  second  factor  (Mo  —  Mi  —  AM)2  is  large 
when  the  reduction  of  the  function  Mo  —  Mi,  with  the  contribution  AM,  does 
not  stem  mainly  from  a  single  direction. 

The  class  MinPow  determines  the  minimum  of  a  function  of  n  variables 
by  successive  minimization  along  chosen  directions  according  to  Powell. 
Figure  10. 1 1  shows  the  advantages  of  that  method.  One  can  see  a  significantly 
faster  convergence  to  the  minimum  compared  to  that  of  Fig.  10.10. 

10.12  Minimization  in  the  Direction  of  Steepest  Descent 

In  order  to  get  from  a  point  xo  to  the  minimum  of  a  function  Mix),  it  is 
sufficient  to  always  follow  the  negative  gradient  b(x)  =  —  VM(x).  It  is  thus 
obvious  that  one  should  look  for  the  minimum  along  the  direction  VM(xo). 
One  calls  this  point  xi,  searches  then  for  the  minimum  along  VM(xi),  and  so 
forth,  until  the  termination  requirement  (10.9.1)  is  satisfied. 

A  comparison  of  the  steps  taken  with  this  method,  shown  in  Fig.  10.12, 
with  those  from  minimization  along  a  coordinate  direction  (Fig.  10.10)  shows, 
however,  a  surprising  similarity.  In  both  cases,  successive  steps  are  perpendic¬ 
ular  to  each  other.  Thus  the  directions  cannot  be  fitted  to  the  function  in  the 
course  of  the  procedure  and  convergence  is  slow. 

The  initially  surprising  fact  that  successive  gradient  directions  are  perpen¬ 
dicular  to  each  other  stems  from  the  construction  of  the  procedure.  Searching 
from  xi  for  the  minimum  in  the  direction  bo  means  that  the  derivative  in  the 
direction  bo  vanishes  at  the  point  xi, 

bo  •  VM(xi)  =  0  , 

and  thus  the  gradient  is  perpendicular  to  bo- 

10.13  Minimization  Along  Conjugate  Gradient  Directions 

We  take  up  again  the  idea  of  conjugate  directions  from  Sect.  10.10.  We 
construct,  starting  from  an  arbitrary  vector  gj  =  hi,  two  sequences  of  vectors, 
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Fig.  10.11:  Minimization  along  a  chosen  direction  with  the  method  of  Powell. 


Fig.  10.12:  Minimization  in  the  direction  of  steepest  descent. 
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with 


g/+i  =  gl-XlAhl  , 

h«+i  =g,  +  |  +  Yi  h/  , 


i  —  1,2,... 
i  —  1,2,... 


hjAhi 


(10.13.1) 

(10.13.2) 

(10.13.3) 


Thus  one  has 

gj+1gi=0  ,  (10.13.4) 

h^+1Ah,=0  .  (10.13.5) 

This  means  that  successive  vectors  g  are  orthogonal,  and  successive  vectors  h 
are  conjugate  to  each  other. 

Rewriting  the  relation  (10.13.3)  gives 


g«'+ig*+i 

g/g, 


(10.13.6) 


g/jh 

h?  Ah, 


(10.13.7) 


We  now  try  to  construct  the  vectors  g,  and  h,  without  explicit  knowledge  of 
the  Hessian  matrix  A.  To  do  this  we  again  assume  that  the  function  to  be 
minimized  is  a  quadratic  form  (10.1.10).  At  a  point  x,  we  define  the  vector 
gj  —  —VM(Xj).  If  we  now  search  for  the  minimum  starting  from  x;  along  the 
direction  h;,  we  find  it  at  x(+i  and  construct  there  g;+1  =  —  VM(x,+ 1).  Then 
g,-+1  and  g(  are  orthogonal,  since  from  (10.1.12)  one  has 


g  i  =  -VM(x,-)  =b  - Ax,- 


and 


gi+i  =  -'VM(xi+i)  =  b-A(xi  +  Xihi)  =  gi-XiAhi  .  (10.13.8) 

Here  X,  has  been  chosen  such  that  x*+i  is  the  minimum  along  the  direction 
h, .  This  means  that  there  the  gradient  is  perpendicular  to  h, , 

hjWM(xi+l)  =  -hjgi+1  =  0  .  (10.13.9) 


Substituting  into  (10.13.8)  gives  indeed 

0  =  hjgf+1  =  h^g,  -  XihjAht  , 


in  agreement  with  (10.13.7). 
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From  these  results  we  can  now  construct  the  following  algorithm.  We 
begin  at  xo,  construct  there  the  gradient  VM(xo),  and  set  its  negative  equal  to 
the  two  vectors 


gl  =  — VM(x0)  ,  hi  =  -VM(xo)  . 

We  minimize  along  hi.  At  the  point  of  the  minimum  xi  we  construct  g2  = 
—  VM(xi)  and  compute  from  (10.13.6) 

(§2  §l)T§l 

n  = - f - 

ggi 


and  from  (10.13.2) 

h2  =  gi  +  yihi  . 

From  xi  we  then  minimize  along  h2,  and  so  forth. 

The  class  Mi  nC  j  g  determines  the  minimum  of  a  function  of  n  variables 
by  successive  minimization  along  conjugate  gradient  directions.  A  comparison 
of  Fig.  10.13  with  Fig.  10.12  shows  the  superiority  of  the  method  of  conjugate 
gradients  over  determination  of  direction  according  to  steepest  descent,  espe¬ 
cially  near  the  minimum. 


Fig.  10.13:  Minimization  along  conjugate  gradient  directions. 
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10.14  Minimization  with  the  Quadratic  Form 

If  the  function  to  be  minimized  M(x)  is  of  the  simple  form  (10.1.10),  then 
the  position  of  the  minimum  is  given  directly  by  (10.1.13).  Otherwise  one  can 
always  expand  M(x)  about  a  point  xo, 

M(x)  =  M(xo)  -b(x-x o)  +  i(x-xo)TA(x-x0)  4 -  (10.14.1) 

with 

3  2M 

b  =  -VM(x0)  ,  Aik  =  — —  (10.14.2) 

oxi  dxfc 

and  obtain  as  an  approximation  for  the  minimum 

xi=xo  +  A_1b  .  (10.14.3) 

One  can  now  compute  again  b  and  A  as  derivatives  at  the  point  xi  and  from 
this  obtain  a  further  approximation  xi  according  to  (10.14.3),  and  so  forth. 

For  the  case  where  the  approximation  (10.14.2)  gives  a  good  description 
of  the  function  M(x),  the  procedure  converges  quickly,  since  it  tries  to  jump 
directly  to  the  minimum.  Otherwise  it  might  not  converge  at  all.  We  have 
already  discussed  the  difficulties  for  the  corresponding  one-dimensional  case 
in  Sect.  10.1  with  Fig.  10.1. 

The  class  Mi  nQdr  finds  the  minimum  of  a  function  of  n  variables  with 
the  quadratic  form.  Figure  10.14  illustrates  the  operation  of  the  method.  One 
can  observe  that  the  minimum  is  in  fact  reached  in  very  few  steps. 

10.15  Marquardt  Minimization 

Marquardt  [16]  has  given  a  procedure  that  combines  the  speed  of  mini¬ 
mization  with  the  quadratic  form  near  the  minimum  with  the  robustness  of 
the  method  of  steepest  descent,  where  one  finds  the  minimum  even  starting 
from  a  point  far  away.  It  is  based  on  the  following  simple  consideration. 

The  prescription  (10.14.3),  written  as  a  computation  of  the  zth  approxi¬ 
mation  for  the  position  of  the  minimum, 

X;  =  x/_i  +  A-1b  ,  (10.15.1) 

means  that  one  obtains  x,  from  the  point  x,-  _  \  by  taking  a  step  along  the  vector 
A-1b.  Here  b  =  —  VM(x,_i)  is  the  negative  gradient,  i.e.,  a  vector  in  the 
direction  of  steepest  descent  of  the  function  M  at  the  point  x,  _  i .  If  in  ( 1 0. 1 5 . 1 ) 
one  had  the  unit  matrix  multiplied  by  a  constant  instead  of  the  matrix  A,  i.e., 
if  instead  of  (10.15.1)  one  were  to  use  the  prescription 
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Fig.  10.14:  Minimization  with  quadratic  form. 


X;  =  X;_1  +  (XI)  !b  ,  (10.15.2) 

then  one  would  step  by  the  vector  b/X  from  x,_i .  This  is  a  step  in  the  direction 
of  steepest  descent  of  the  function,  which  is  smaller  for  larger  values  of  the 
constant  X.  A  sufficiently  small  step  in  the  direction  of  steepest  descent  is, 
however,  always  a  step  towards  the  minimum  (at  least  when  one  is  still  in 
the  “approach”  to  the  minimum,  i.e.,  in  the  one-dimensional  case  of  Fig.  10.1, 
between  the  two  maxima).  The  Marquardt  procedure  consists  of  interpolat¬ 
ing  between  the  prescriptions  (10.15.1)  and  (10.15.2)  in  such  a  way  that  the 
function  M  is  reduced  with  every  step,  and  such  that  the  fast  convergence 
of  (10.15.1)  is  exploited  as  much  as  possible. 

In  place  of  (10.15.1)  or  (10.15.2)  one  computes 

x(-  =  X(_i  +  (A  +  XI)~lb  .  (10.15.3) 

Here  X  is  determined  in  the  following  way.  One  first  chooses  a  fixed  number 
v  >  1  and  denotes  by  ai,_I)  the  value  of  X  from  the  previous  step.  As  an  initial 
value  one  chooses,  e.g.,  =  0.01.  The  value  obtained  from  (10.15.3)  of  x, 

clearly  depends  on  X.  One  computes  two  points  x/(a('_I))  and  x/(a('~i,/i-'F 
where  for  X  one  chooses  the  values  X  ^  ~ 1  -*  and  X  ^  -  ^  /  v ,  and  the  corresponding 
function  values  Mj  =  Af(x/(A(,-I)))  and  M{;v)  =  M {Xj(ku~{) / v)).  These 

L 
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are  compared  with  the  function  value  M,-_ i  =  M(xj_ i).  The  result  of  the 
comparison  determines  what  happens  next.  The  following  cases  are  possible: 

(i)  M;(y)  < 

One  sets  x;  =  Xj(A^-1Vv)  and  )Jl> 

(ii)  M\v)  >  Mi -i  and  Mf  <  M/_i: 

One  sets  x(-  =  and  A^  = 

(iii)  M.(v)  >  Af,-_i  and  Mf  > 

L 

One  replaces  A*-'-1-*  by  A,'~l,u  and  repeats  the  computation  of  x,  (A,'_l) 
/v)  and  x/(A(,_l))  and  the  corresponding  function  values,  and  repeats 
the  comparisons. 

In  this  way  one  ensures  that  the  function  value  in  fact  decreases  with  each 
step  and  that  the  value  of  A  is  always  as  small  as  possible  when  adjusted 
to  the  local  situation.  Clearly  (10.15.3)  becomes  (10.15.1)  for  A  — 0,  i.e., 
it  describes  minimization  with  the  quadratic  form.  For  very  large  values  of 
A,  Eq.  (10.15.3)  becomes  the  relation  (10.15.2),  which  prescribes  a  small  but 
sure  step  in  the  direction  of  steepest  descent. 


Fig.  10.15:  Minimization  with  the  Marquardt  procedure. 


The  class  MinMar  finds  the  minimum  of  a  function  of  n  variables 
by  Marquardt  minimization.  In  Fig.  10.15  one  sees  the  operation  of  the 
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Marquardt  method.  It  follows  a  systematic  path  to  the  minimum.  Compar¬ 
ing  with  Fig.  10.14  one  sees  that  the  method  of  the  quadratic  form  converged 
in  fewer  steps.  The  Marquardt  procedure,  however,  leads  to  the  minimum  in 
many  cases  where  the  method  of  the  quadratic  form  would  fail,  e.g.,  in  cases 
with  poorly  determined  initial  values. 


10.16  On  Choosing  a  Minimization  Method 

Given  the  variety  of  minimization  methods,  the  user  is  naturally  faced  with 
the  question  of  choosing  a  method  appropriate  to  the  task.  Before  we  give 
recommendations  for  this,  we  will  first  recall  the  various  methods  once  again. 

The  simplex  method  (routine  MinSim)  is  particularly  robust.  Only  func¬ 
tion  values  M(x)  are  computed.  The  method  is  slow,  however.  Faster,  but  still 
quite  robust  is  minimization  along  a  chosen  direction  (routine  MinPow).  This 
also  requires  only  function  values. 

The  method  of  conjugate  gradients  (routine  MinCjg)  requires,  as  the 
name  indicates,  the  computation  of  not  only  the  function,  but  also  its  gradient. 
The  number  of  iteration  steps  is,  however,  approximately  equal  to  that  of 
MinPow. 

For  minimization  with  the  quadratic  form  (routine  MinQdr)  and  Mar¬ 
quardt  minimization  (routine  MinMar)  one  requires  in  addition  the  Hessian 
matrix  of  second  derivatives.  The  derivatives  are  computed  numerically  with 
utility  routines.  The  user  can  replace  these  utility  routines  by  routines  in  which 
the  analytic  formulas  for  the  derivatives  are  programmed.  If  the  starting  val¬ 
ues  are  sufficiently  accurate,  MinQdr  converges  after  just  a  few  steps.  The 
convergence  is  slower  for  MinMar.  The  method  is,  however,  more  robust;  it 
often  converges  when  starting  from  values  from  which  MinQdr  would  fail. 

From  these  characteristics  of  the  methods  we  arrive  at  the  following 
recommendations: 

1.  For  problems  that  need  only  to  be  solved  once  or  not  many  times,  that 
is,  in  which  the  computing  time  does  not  play  an  important  role,  one 
should  choose  MinSim  or  MinPow. 

2.  For  problems  occurring  repeatedly  (with  different  numerical  values) 
one  should  use  MinMar.  If  one  always  has  an  accurate  starting  ap¬ 
proximation,  MinQdr  can  be  used. 

3 .  For  repeated  problems  the  derivatives  needed  for  MinMar  or  MinQdr 
should  be  calculated  analytically.  Although  this  entails  additional  pro¬ 
gramming  work,  one  gains  precision  compared  to  numerical  derivatives 
and  saves  in  many  cases  computing  time. 
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At  this  point  we  should  make  an  additional  remark  about  the  comparison 
of  the  minimization  methods  of  this  chapter  with  the  method  of  least  squares 
from  Chap.  9.  The  method  of  least  squares  is  a  special  case  of  minimization. 
The  function  to  be  minimized  is  a  sum  of  squares,  e.g.,  (9.1.8),  or  the  gen¬ 
eralization  of  a  sum  of  squares,  e.g.,  (9.5.9).  In  this  generalized  sum  of 
squares  there  appears  the  matrix  of  derivatives  A.  These  are  not,  however, 
the  derivatives  of  the  function  to  be  minimized,  but  rather  derivatives  of  a 
function  f,  which  characterizes  the  problem  under  consideration;  cf.  (9.5.2). 
Second  derivatives  are  never  needed.  In  addition,  if  one  uses  the  singular  value 
decomposition,  as  has  been  done  in  our  programs  in  Chap.  9  to  solve  the  least- 
squares  problem,  one  works  in  numerically  critical  cases  with  a  considerably 
higher  accuracy  than  in  the  computation  of  sums  of  squares  (cf.  Sect.  A.  13, 
particularly  Example  A.4). 

Least-squares  problems  should  therefore  always  be  carried  out  with 
the  routines  of  Chap.  9.  This  applies  particularly  to  problems  of  fitting  of 
functions  like  those  in  the  examples  of  Sect.  9.6,  when  one  has  many  mea¬ 
sured  points.  The  matrix  A  then  contains  many  rows,  but  few  columns.  In 
computing  the  function  to  be  minimized  the  product  ATA  is  encountered,  and 
one  is  threatened  with  the  above  mentioned  loss  of  accuracy  in  comparison  to 
the  singular  value  decomposition. 
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In  data  analysis  minimization  procedures  are  used  for  determining  best  esti¬ 
mates  x  for  unknown  values  x.  The  function  to  be  minimized  M(x)  is  here 
usually  a  sum  of  squares  (Chap.  9)  or  a  log-likelihood  function  (multiplied 
by  —1)  as  in  Chap.  .  In  using  the  equations  from  Chap,  one  must  note, 
however,  that  there  the  n -vector  that  was  called  \  is  now  denoted  by  x.  The 
variables  in  Chap,  that  were  called  are  the  measured  values  (usually 
called  y  in  Chap.  9). 

Information  on  the  error  of  x  is  obtained  directly  from  results  of  Sects.  9.7, 
9.8,  and  9.13  by  defining 


(10.17.1) 


as  the  elements  of  the  symmetric  matrix  of  second  derivatives  (the  Hessian 
matrix )  of  the  minimization  function.  Here  one  must  note  that  the  factor  /ql 
takes  on  the  numerical  value 

fQL  =  1  (10.17.2) 

when  the  function  being  minimized  is  a  sum  of  squares.  If  the  function  is  a 
log-likelihood  function  (times  —  1 )  then  this  must  be  set  equal  to 
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f QL  =  1/2 


(10.17.3) 


1.  Covariance  matrix.  Symmetric  errors.  The  covariance  matrix  of  x  is 

Cx  =  2fQLH~l  .  (10.17.4) 

The  square  roots  of  the  diagonal  elements  are  the  (symmetric)  errors 

Axj  =  ^fcfi  .  (10.17.5) 

It  is  only  meaningful  to  give  the  covariance  matrix  if  the  measurement 
errors  are  small  and/or  there  are  many  measurements,  i.e.,  the  requirements 
for  (7.5.8)  are  fulfilled. 

2.  Confidence  ellipsoid.  Symmetric  confidence  limits.  The  covariance 
matrix  defines  the  covariance  ellipsoid;  cf.  Sects.  5.10  and  A.ll.  Its  center 
is  given  by  x  =  x.  The  probability  that  the  true  value  of  x  is  contained  inside 
the  ellipsoid  is  given  by  (5.10.20).  The  ellipsoid  for  which  this  probability  has 
a  given  value  W ,  the  confidence  level,  is  given  by  the  confidence  matrix 

Ciw)  =  xw(n  f)Cz  .  (10.17.6) 

Here  Xiy(n/)  ’s  the  quantile  of  the  x 2-distribution  for  n  f  degrees  of  freedom 
and  probability  P  —  W\  cf.  (5.10.19)  and  (C.5.3).  The  number  of  degrees  of 
freedom  ny  is  equal  to  the  number  of  measured  values  minus  the  number  of 
parameters  determined  by  the  minimization.  The  square  roots  of  the  diagonal 
elements  of  are  the  distances  from  the  symmetric  confidence  limits. 


( w ) 


xi± 


~  I 

=  X;  ± 


(10.17.7) 


The  class  Mi  nCov  yields  the  covariance  or  confidence  matrix,  respectively, 
for  parameters  determined  by  minimization. 

3.  Confidence  region.  If  giving  a  covariance  or  confidence  ellipsoid  is  not 
meaningful,  it  is  still  possible  to  give  a  confidence  region  with  confidence 
level  W.  It  is  determined  by  the  hypersurface 

M(x)  =  M(x)  +  xi(nf)fQL  .  (10.17.8) 


With  the  help  of  the  following  routine  a  contour  of  a  cross  section  through 
this  hypersurface  is  drawn  in  a  plane  that  contains  the  point  x  and  is  parallel 
to  (xi ,  xj ) .  Here  v,-  and  xj  are  two  components  of  the  vector  of  parameters  x. 
The  boundary  of  the  confidence  region  in  a  plane  spanned  by  two  parameters, 
which  were  found  by  minimization,  can  be  graphically  shown  with  the  method 
DatanG rapin'  cs  .  drawContour,  cf.  Examples  10.1-10.3  and  Example 
Programs  10.2-10.4. 
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4.  Asymmetric  errors  and  confidence  limits.  If  the  confidence  region  is 
not  an  ellipsoid,  the  asymmetric  confidence  limits  for  the  variable  x,  can  still 
be  determined  by 


min{M$y,xi=xW)}  =  M$)  +  Xw(nf)fQL  •  (10.17.9) 


The  differences 


Ax- 


Ax 


(W) 

(W)  ~ 

—  Xi+  Xi 

(W) 

-  r. 

i  — 

-  Xi  X  ■ 

1  l  — 

(10.17.10) 


are  the  (asymmetric)  distances  from  the  confidence  limits.  If  one  takes  xjv(n /) 
=  1,  the  asymmetric  errors  Axj+  and  Axj-  are  obtained.  The  class  MinAsy 
yields  asymmetric  errors  or  the  distances  from  the  confidence  limits,  respec¬ 
tively,  for  parameters,  determined  by  minimization. 


10.18  Examples 

Example  10.1:  Determining  the  parameters  of  a  distribution  from  the 
elements  of  a  sample  with  the  method  of  maximum  likelihood 

Suppose  one  has  N  measurements  yi,  yi,  ...,  yn  that  can  be  assumed  to 
come  from  a  normal  distribution  with  expectation  value  a  =  x\  and  standard 
deviation  a  =  x 2-  The  likelihood  function  is 


'=fl 


1 


X2V2 


expt 


TC 


(yt  -*i)2 

2xf 


(10.18.1) 
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and  its  logarithm  is 


‘  =  -£ 


(yi-x  i)2 

2x2 


N\n{x2^/2ji} 


(10.18.2) 


The  task  of  determining  the  maximum-likelihood  estimators  x\ ,  x2  has  already 
been  solved  in  Example  7.8  by  setting  the  analytically  calculated  deriva¬ 
tives  of  the  function  £(x)  to  zero.  Here  we  will  accomplish  this  by  means 
of  numerical  minimization  of  £(x).  For  this  we  must  first  provide  a  function 
that  computes  the  function  to  be  minimized, 

M(x)  =  — £(x)  . 

This  simple  example  is  implemented  in  Example  Programs  10.2  and  10.3. 
The  results  of  the  minimization  of  this  user  function  are  shown  in  Fig.  10.16 
for  two  samples.  Confidence  regions  and  covariance  ellipses  agree  reasonably 
well  with  each  other.  The  agreement  is  better  for  the  larger  sample.  ■ 


Example  10.2:  Determination  of  the  parameters  of  a  distribution  from  the 
histogram  of  a  sample  by  maximizing  the  likelihood 

Instead  of  the  original  sample  yi,  y2 ,  . . .,  y,v  as  in  Example  10.1,  one  often 
uses  the  corresponding  histogram.  Denoting  by  n,  the  number  of  observations 
that  fall  into  the  interval  centered  about  the  point  tj  with  width  At, 

ti  -  At/2  <  y  <  tt  +  At/2  ,  (10.18.3) 

the  histogram  is  given  by  the  pairs  of  numbers 

(ti,tii)  ,  i  —  \,2,...,n  .  (10.18.4) 


If  the  original  sample  is  taken  from  a  normal  distribution  with  expectation 
value  x\=a  and  standard  deviation  X2  =  or,  i.e.,  from  the  probability  density 


f(t;x  i,x2) 


(t-x  l)2 
2x2 


(10.18.5) 


then  one  might  expect  the  rij  (ti)  (at  least  in  the  limit  N  — ►  oo)  to  be  given  by 


gi  =  N  Atf(ti\x\,X2)  .  (10.18.6) 


The  quantities  nt  (ti)  are  integers,  which  are  clearly  not  equal  in  general  to  gi . 
We  can,  however,  regard  each  n/(6  )  as  a  sample  of  size  one  from  a  Poisson 
distribution  with  the  expectation  value 


7/  —  gi 


(10.18.7) 
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Fig.  10.16:  Determination  of  the  parameters  x\  (mean)  and  X2  (width)  of  a  Gaussian  distribu¬ 
tion  by  maximization  of  the  log-likelihood  function  of  a  sample.  The  plots  on  the  left  show 
two  different  samples,  marked  as  one-dimensional  scatter  plots  ( tick  marks )  on  the  y  axis.  The 
curves  f(y)  are  Gaussian  distributions  with  the  fitted  parameters.  The  right-hand  plots  show 
in  the  (x\ ,  X2)  plane  the  values  of  the  parameters  obtained  with  symmetric  errors  and  covari¬ 
ance  ellipses,  as  well  as  the  confidence  region  for  Xw  =  1  and  the  corresponding  confidence 
boundaries  ( horizontal  and  vertical  lines). 


The  a  posteriori  probability  to  observe  the  value  ni  ( 4/ )  is  clearly 


1 


L 


(10.18.8) 


The  likelihood  function  for  the  observation  of  the  entire  histogram  is 


and  its  logarithm  is 


n 


n 


n 


i  =  -  y^ln Hi !  +  7>,  Ink,  -T> 


i= 1 


i  =  \ 


i— 1 


(10.18.9) 


(10.18.10) 


If  we  use  for  A;  the  notation  of  (10.18.7)  and  find  the  minimum  of  —t  with 
respect  to  x\,  %2,  then  this  determines  the  best  estimates  of  the  parame¬ 
ters  xi,  X2-  Of  course  the  same  procedure  can  be  applied  not  only  in  the 
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case  of  a  simple  Gaussian  distribution,  but  for  any  distribution  that  depends 
on  parameters.  One  must  simply  use  the  corresponding  probability  density 
in  (10.18.5)  and  (10.18.6).  In  the  user  function  given  below  one  must  change 
only  one  instruction  in  order  to  replace  the  Gaussian  by  another  distribu¬ 
tion.  This  example  is  implemented  in  Example  Program  10.4.  There  the  user 
function  carries  the  name  MinLogLikeHistPoison. 


Fig.  10.17:  Determination  of  the  parameters  x\  (mean)  and  xi  (width)  of  a  Gaussian  distri¬ 
bution  by  maximizing  the  log-likelihood  function  of  a  histogram.  The  left-hand  plots  show 
two  histograms  corresponding  to  the  samples  from  Fig.  10.16.  The  curves  are  the  Gaussian 
distributions  normalized  to  the  histograms.  The  right-hand  plots  show  the  symmetric  errors, 
covariance  ellipse,  and  confidence  region  represented  as  in  Fig.  10.16. 


The  results  of  the  minimization  with  this  user  function  are  shown  in 
Fig.  10. 17  for  two  histograms,  which  are  based  on  the  samples  from  Fig.  10. 16. 
The  results  are  very  similar  to  those  from  Example  10.1.  The  errors  of 
the  parameters,  however,  are  somewhat  larger.  This  is  to  be  expected,  since 
some  information  is  necessarily  lost  when  constructing  a  histogram  from  the 
sample. 

A  histogram  can  be  viewed  as  a  sample  in  a  compressed  representation. 
The  compression  becomes  greater  as  the  bin  width  of  the  histogram  increases. 
This  is  made  clear  in  Fig.  10. 18.  One  sees  that  for  the  same  sample,  the  errors 
of  the  determined  parameters  increase  for  greater  bin  width.  The  effect  is  rel¬ 
atively  small,  however,  for  the  relatively  large  sample  size  in  this  case.  ■ 
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Fig.  10.18:  As  in  Fig.  10.17,  but  for  histograms  with  different  interval  widths  of  the  same 
sample. 


Example  10.3:  Determination  of  the  parameters  of  a  distribution  from  the 
histogram  of  a  sample  by  minimization  of  a  sum  of  squares 

If  the  bin  contents  n,  of  a  histogram  (10.18.4)  are  sufficiently  large,  then 
the  statistical  fluctuations  of  each  n,  can  be  approximated  by  a  Gaussian 
distribution  with  the  standard  deviation 

Arii  =  ^/n~i  .  (10.18.11) 

(See  Sect.  6.8.)  The  weighted  sum  of  squares  describing  the  deviation  of  the 
histogram  contents  n,-  from  the  expected  values  gj  from  (10.18.6)  is  then 

e  =  E 


(10.18.12) 
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When  carrying  out  the  sum  one  must  take  care  (in  contrast  to  Example  10.2) 
that  empty  bins  (n,  =  0)  are  not  included.  Even  better  is  to  not  include  bins  in 
the  sum  even  with  low  bin  contents,  e.g.,  «/  <  4. 

Also  this  example  is  implemented  in  Example  Program  10.4.  Here  g,  is 
given  by  (10.18.6)  as  in  the  previous  example.  The  sum  is  carried  out  over  all 
bins  with  «/  >  0.  The  user  function  is  called  MinHistSumOfSquares. 

The  statistical  fluctuations  of  the  bin  contents  are  taken  as  approximately 
Gaussian. 

Figure  10.19  shows  the  results  obtained  by  minimizing  the  quadratic 
sum  for  the  histogram  in  Fig.  10.18.  Since  the  histograms  are  based  on  the 
same  sample,  the  values  decrease  for  decreasing  bin  width,  whereby  the 
requirement  for  using  the  sum  of  squares  becomes  less  well  fulfilled.  Thus  we 
cannot  trust  as  much  the  results  nor  the  errors  for  smaller  bin  width.  One  sees, 
however,  that  the  errors  given  by  the  procedure  in  fact  increase  for  decreasing 
bin  width.  ■ 

We  emphasize  that  the  determination  of  parameters  from  a  histogram  by 
quadratic-sum  minimization  gives  less  exact  results  than  those  obtained  by 
likelihood  maximization.  This  is  because  the  assumption  of  a  normal  distribu¬ 
tion  of  the  Hi  with  the  width  (10.18.1 1)  is  only  an  approximation,  which  often 
requires  large  bin  widths  for  the  histogram  and  thus  implies  a  loss  of  informa¬ 
tion.  If,  however,  enough  data,  i.e.,  sufficiently  large  samples,  are  available, 
then  the  difference  between  the  two  procedures  is  small.  One  should  compare, 
e.g.,  Fig.  10. 1 8  (upper  right-hand  plot)  with  Fig.  10. 19  (lower  right-hand  plot). 


10.19  Java  Classes  and  Example  Programs 

Java  Classes  for  Minimization  Problems 

MinParab  finds  the  extremum  of  a  parabola  through  three  given  points. 

FunctionOnline  computes  the  value  of  a  function  on  a  straight  line  in 
n -dimensional  space. 

MinEnclose  brackets  the  minimum  on  a  straight  line  in  n -dimensional 
space. 

MinCombined  finds  the  minimum  in  a  given  interval  along  a  straight  line 
with  a  combined  method  according  tho  Brent. 

MinDir  finds  the  minimum  on  a  straight  line  in  n -dimensional  space. 

MinSim  finds  the  minimum  in  n-dimensional  space  using  the  simplex 
method. 
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Fig.  10.19:  Determination  of  the  parameters  x\  (mean)  and  X2  (width)  of  a  Gaussian  distribu¬ 
tion  by  minimization  of  a  weighted  sum  of  squares.  The  histograms  on  the  left  are  the  same 
as  in  Fig.  10.18.  The  fit  results  and  errors,  covariance  ellipses,  and  confidence  regions  are 
represented  as  in  Figs.  10.17  and  10.18. 


MinPow  finds  the  minimum  in  n -dimensional  space  with  Powell’s  method 
of  chosen  directions. 

MinCjg  finds  the  minimum  in  /; -dimensional  space  with  the  method  of  con¬ 
jugate  directions. 

MinQdr  finds  the  minimum  in  n -dimensional  space  with  quadratic  form. 

MInMar  finds  the  minimum  in  n-dimensional  space  with  Marquardt’s 
method. 
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MinCov  finds  the  covariance  matrix  of  the  coordinates  of  the  minimum. 
MinAsy  finds  the  asymmetric  errors  of  the  coordinates  of  the  minimum. 

Example  Program  10.1:  The  class  ElMin  demonstrates  the  use  of 
MinSim,  MinPow,  MinCjg,  MinQdr,  and  MinMar 

The  program  calls  one  of  the  classes,  as  requested  by  the  user,  in  order  to  solve 
the  following  problem.  One  wants  to  find  the  minimum  of  the  function  /  =  /(x)  = 
f(x i,  X2,  X3).  The  search  is  started  at  the  point  x(m)  =  (v{in),  x^\  x^),  which  is  also 
input  by  the  user.  The  program  treats  consecutively  four  different  cases: 

(i)  No  variables  are  fixed, 

(ii)  V3  is  fixed, 

(iii)  x2  and  a'3  are  fixed, 

(iv)  All  variables  are  fixed. 

The  user  can  choose  one  of  the  following  functions  to  be  minimized: 

/i(x)  =  r2  ,  r  =  yjxl+xl  +  x2  , 

/2(x)  =  r10  , 

fi  (X>  =  r  , 

/4(x)  =  — e~r2  , 

/5(x)  =  r6-2r4  +  r2, 

/6(x)  =  r2e~r2  , 

f-iix)  =  -e~r2  -  lOe-''2  ,  r2  =  (xi  -  3)2  +  (x2  -  3)2  +  (x3  -  3)2  . 

Suggestions:  Discuss  the  functions  f\  through  /7.  All  of  them  have  a  minimum 
at  x\  —  X2  =  X3  =  0.  Some  possess  additional  minima.  Study  the  convergence  behav¬ 
ior  of  the  different  minimization  methods  for  these  functions  using  different  starting 
points  and  explain  this  behavior  qualitatively. 

Example  Program  10.2:  The  class  E2Min  determines  the  parameters 

of  a  distribution  from  the  elements  of  a  sample  and  demonstrates  the 
use  of  MinCov 

The  program  solves  the  problem  in  Example  10.1.  First  a  sample  is  drawn  from  the 
standard  normal  distribution.  Next  the  sample  is  used  to  estimate  the  parameters 
xi  (mean)  and  X2  (standard  deviation)  of  the  population  by  minimizing  the  nega¬ 
tive  of  the  likelihood  function  (10.18.2).  This  is  done  withMinSim.  The  covariance 
matrix  of  the  parameters  is  determined  by  callingMinCov.  The  results  are  presented 
numerically.  The  rest  of  the  program  performs  the  graphical  display  of  the  sample 
as  a  one-dimensional  scatter  plot  and  of  the  fitted  function  as  in  Fig.  10.16  (left-hand 
plots). 

Suggestion:  Run  the  program  for  samples  of  different  size. 
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Example  Program  10.3:  The  class  E3Min  demonstrates  the  use  of 
MinAsy  and  draws  the  boundary  of  a  confidence  region 

The  class  solves  the  same  problem  as  the  previous  example.  In  addition  it  com¬ 
putes  the  asymmetric  errors  of  the  parameters  using  MinAsy.  Then  the  solution, 
the  symmetric  errors,  the  covariance  ellipse,  and  the  asymmetric  errors  are  displayed 
graphically  in  the  (x\,X2)  plane,  and  with  the  help  of  the  method  DatanGra- 
phics.drawContour,  the  contour  of  the  confidence  region  is  shown  as  well.  The 
plot  corresponds  to  Fig.  10.16  (right-hand  plots). 

Example  Program  10.4:  The  class  E4Min  determines  the  parameters  of  a 
distribution  from  the  histogram  of  a  sample 

The  program  solves  the  problem  of  Examples  10.2  and  10.3.  First  a  sample  of 
size  /iev  is  drawn  from  a  standard  normal  distribution  and  a  histogram  with  nt 
bins  between  to  =  —5.25  and  tmSiX  =  5.25  is  constructed  from  the  sample.  The  bin 
centers  are  tt  (/'  =  1,2,...,  nt ),  and  the  bin  contents  are  /?/.  As  chosen  by  the  user 
either  the  likelihood  function  i  given  by  (10.18.10)  is  maximized  (i.e.,  —t  is  min¬ 
imized)  by  MinSim  and  the  user  function  MinLogLikeHistPoison,  or  the 
sum  of  squares  Q  given  by  (10.18.12)  is  minimized,  again  with  MinSim,  but  using 
MinHistSumOfSquares. 

The  results  are  presented  in  graphical  form.  One  plot  contains  the  histogram  and 
the  fitted  Gaussian  (i.e.,  a  plot  corresponding  to  the  plots  on  the  left-hand  side  of 
Figs.  10.17  or  10.18),  a  second  one  presents  the  solution  in  the  (x\,X2)  plane  with 
symmetric  and  asymmetric  errors,  covariance  ellipse,  and  confidence  region  (corre¬ 
sponding  to  the  plots  on  the  right-hand  side  of  those  figures). 

Suggestions:  (a)  Choose  nQV  =  100.  Show  that  for  likelihood  maximization  the 
errors  Ax i,  Ax 2  increase  if  you  decrease  the  number  of  bins  beginning  with  nt  —  100, 
but  that  for  the  minimization  of  the  sum  of  squares  the  number  of  bins  has  to  be  small 
to  get  meaningful  errors,  (b)  Show  that  for  /iev  =  1000  and  nt  =  50  or  nt  —  20  there 
is  practically  no  difference  between  the  results  of  the  methods. 


11.  Analysis  of  Variance 


The  analysis  of  variance  (or  ANOVA),  originally  developed  by  R.  A.  Fisher, 
concerns  testing  the  hypothesis  of  equal  means  of  a  number  of  samples. 
Such  problems  occur,  for  example,  in  the  comparison  of  a  series  of  measure¬ 
ments  carried  out  under  different  conditions,  or  in  quality  control  of  sam¬ 
ples  produced  by  different  machines.  One  tries  to  discover  what  influence  the 
changing  of  external  variables  (e.g.,  experimental  conditions,  the  number  of 
a  machine)  has  on  a  sample.  For  the  simple  case  of  only  two  samples,  this 
problem  can  also  be  solved  with  Student’s  difference  test  (Sect.  8.3). 

We  speak  of  one-way  analysis  of  variance,  or  also  one-way  classification, 
when  only  one  external  variable  is  changed.  The  evaluation  of  a  series  of  mea¬ 
surements  of  an  object  micrometer  performed  with  different  microscopes  can 
serve  as  an  example.  One  has  a  two-  (or  more)  way  analysis  of  variance  (two- 
way  classification)  when  several  variables  are  changed  simultaneously.  In  the 
example  above,  if  different  observers  carry  out  the  series  of  measurements 
with  each  microscope,  then  a  two-way  analysis  of  variance  can  investigate 
influences  of  both  the  observer  and  the  instrument  on  the  result. 

11.1  One-Way  Analysis  of  Variance 

Let  us  consider  a  sample  of  size  n,  which  can  be  divided  into  t  groups 
according  to  a  certain  criterion  A.  Clearly  the  criterion  must  be  related  to 
the  sampling  or  measuring  process.  We  say  that  the  groups  are  constructed 
according  to  the  classification  A.  We  assume  that  the  populations  from  which 
the  t  subsamples  are  taken  are  normally  distributed  with  the  same  variance 
a2.  We  now  want  to  test  the  hypothesis  that  the  mean  values  of  these  popu¬ 
lations  are  also  equal.  If  this  hypothesis  is  true,  then  all  of  the  samples  come 
from  the  same  population.  We  can  then  apply  the  results  of  Sect.  6.4  (samples 
from  subpopulations).  Using  the  same  notation  as  there,  we  have  t  groups  of 
size  ft*  with 
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=  J2nt 
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and  we  write  the  jth  element  of  the  ith  group  as  x,y .  The  sample  mean  of  the 
i th  group  is 


Hi 


i.=1E 
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xij 


(11.1.1) 


1  7  =  1 


and  the  mean  of  the  entire  sample  is 
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(11.1.2) 


We  now  construct  the  of  squares 
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The  last  term  vanishes  because  of  (1 1 . 1 . 1 )  and  (11.1.2).  One  therefore  has 


t  Hi 


t  Hi 


(xi  i  xi ) 


Q  =  ^2^2(xij  ~x)2  =  ^2ni(xi~x)2  + 

i—lj—l  i  =  1  i  —  \  j—\ 

Q  =  Qa  +  Qw  ■  (11.1.3) 


The  first  term  is  the  sum  of  squares  between  the  groups  obtained  with  the 
classification  A.  The  second  term  is  a  sum  over  the  sums  of  squares  within 
a  group.  The  sum  of  squares  Q  is  decomposed  into  a  sum  of  two  sums  of 
squares  corresponding  to  different  “sources”  -  the  variation  of  means  within 
the  classification  A  and  the  variation  of  measurements  within  the  groups.  If 
our  hypothesis  is  correct,  then  Q  is  a  sum  of  squares  from  a  normal  distri¬ 
bution,  i.e.,  <2/(7 2  follows  a  ^-distribution  with  n  —  1  degrees  of  freedom. 
Correspondingly,  for  each  group  the  quantity 


Qi 


1  m 

=  — J  5 ~2(XiJ  ~Xi)2 


cr  a 


7  =  1 


follows  a  x  2-distribution  with  n,-  —  1  degrees  of  freedom.  The  sum 
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is  then  described  by  a  x  ^distribution  with  J2i  (ni  —  1)  =  n  —  t  degrees  of 
freedom  (see  Sect.  6.6).  Finally,  Qa/®1  follows  a  x  ^distribution  with  t  —  1 
degrees  of  freedom. 

The  expressions 
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(11.1.4) 


are  unbiased  estimators  of  the  population  variances.  (In  Sect.  6.5  we  called 
such  expressions  mean  squares.)  The  ratio 

F  =  s\/s2w  (11.1.5) 


can  thus  be  used  to  carry  out  an  F-test. 

If  the  hypothesis  of  equal  means  is  false,  then  the  values  a,-  of  the  individ¬ 
ual  groups  will  be  quite  different.  Thus  s2A  will  be  relatively  large,  while  s^, 
which  is  the  mean  of  the  variances  of  the  individual  groups,  will  not  change 
much.  This  means  that  the  ratio  (11.1.5)  will  be  large.  Therefore  one  uses  a 
one-sided  F-test.  The  hypothesis  of  equal  means  is  rejected  at  the  significance 
level  a  if 

F  =  s\/sw  >  F\-a(t  —\,n  —  t)  .  (11.1.6) 

The  sums  of  squares  can  be  computed  according  to  two  equivalent  formulas, 
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Qa 

Qw 


J2  - x)1 = J2  J2 xij  - nxl  • 

i  J  i  J 

(Jr,-  —  x)2  =  —nx2  , 

i  i 


(11.1.7) 


The  expression  on  the  right  of  each  line  is  usually  easier  to  compute.  Since 
each  sum  of  squares  is  obtained  by  computing  the  difference  of  two  relatively 
large  numbers,  one  must  pay  attention  to  possible  problems  with  errors  in 
rounding.  Although  one  only  needs  the  sums  Qa  and  Qw  in  order  to  com¬ 
pute  the  ratio  F,  it  is  recommended  to  compute  Q  as  well,  since  one  can  then 
perform  a  check  using  (11.1.3),  i.e.,  Q  =  Qa  +  Qw  -  The  check  is  only  mean¬ 
ingful  when  Q  is  computed  with  the  left-hand  form  of  (11.1.3).  Usually  the 
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results  of  an  analysis  of  variance  are  summarized  in  a  so-called  analysis  of 
variance  table,  (or  ANOVA  table)  as  shown  in  Table  11.1. 

Before  carrying  out  an  analysis  of  variance  one  must  consider  whether 
the  requirements  are  met  under  which  the  procedure  has  been  derived.  In  par¬ 
ticular,  one  must  check  the  assumption  of  a  normal  distribution  for  the  mea¬ 
surements  within  each  group.  This  is  by  no  means  certain  in  every  case.  If,  for 
example,  the  measured  valuesare  always  positive  (e.g.,  the  length  or  weight 


Table  11.1:  ANOVA  table  for  one-way  classification. 
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of  an  object)  and  if  the  standard  deviation  is  of  a  magnitude  comparable  to  the 
measured  values,  then  the  probability  density  can  be  asymmetric  and  thus  not 
Gaussian.  If,  however,  the  original  measurements  (let  us  denote  them  for  the 
moment  by  x')  are  transformed  using  a  monotonic  transformation  such  as 

x  =  alog(x'  +  b)  ,  (11.1.8) 

where  a  and  b  are  appropriately  chosen  constants,  then  a  normal  distribution 
can  often  be  sufficiently  well  approximated.  Other  transformations  sometimes 
used  are  jc  =  \fx'  or  x  =  1  jx' . 

Example  11.1:  One-way  analysis  of  variance  of  the  influence  of 
various  drugs 

The  spleens  of  mice  with  cancer  are  often  attacked  particularly  strongly.  The 
weight  of  the  spleen  can  thus  serve  as  a  measure  of  the  reaction  to  various 
drugs.  The  drugs  (I— III)  were  used  to  treat  ten  mice  each.  Table  1 1.2  contains 
the  measured  spleen  weights,  which  have  already  been  transformed  accord¬ 
ing  to  x  —  logx',  where  x'  is  the  weight  in  grams.  Most  of  the  calculation 
is  presented  in  Table  11.2.  Table  11.3  contains  the  resulting  ANOVA  table. 
Since  even  at  a  significance  level  of  50%  the  F-test  gives  Fq  ^{2,  24)  =  3.4, 
one  cannot  reject  the  hypothesis  of  equal  mean  values.  The  experiment  thus 
showed  no  significant  difference  in  the  effectiveness  of  the  three  drugs.  ■ 
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Table  11.2:  Data  for  Example  11.1. 


Experiment 

number 

Group 

I 

II 

III 

1 

19 

40 

32 

2 

45 

28 

26 

3 

26 

26 

30 

4 

23 

15 

17 

5 

36 

24 

23 

6 

23 

26 

24 

7 

26 

36 

29 

8 

33 

27 

20 

9 

22 

28 

10 

19 

=2°607 

E,  xj 

253 

269 

201 

Ei  T.,  x„  =  723 

Hi 

9 

10 

8 

n  —  21 

nx 2 = 19360 

Xi 

28.11 

26.90 

25.13 

x  —  26.78 

X? 

790.23 

723.61 

631.52 

£,■  «i*?=  19398 

Table  11.3:  ANOVA  table  for  Example  11.1. 


Source 

SS 

DF 

MS 

F 

Between  the  groups 

38 

2 

19.0 

0.377 

Within  the  groups 

1209 

24 

50.4 

Sum 

1247 

26 

47.8 

11.2  Two-Way  Analysis  of  Variance 

Before  we  turn  to  analysis  of  variance  with  two  external  variables,  we  would 
like  to  examine  more  carefully  the  results  obtained  for  one-way  classification. 
We  denoted  the  j  th  measurement  of  the  quantity  x  in  group  i  by  Xjj .  We  now 
assume  for  simplicity  that  each  group  contains  the  same  number  of  measure¬ 
ments,  i.e.,  rii  =  J .  In  addition  we  denote  the  total  number  of  groups  by  I . 
The  classification  into  individual  groups  was  done  according  to  the  criterion 
A,  e.g.,  the  production  number  of  a  microscope,  by  which  the  groups  can  be 
distinguished.  The  labeling  according  to  measurement  and  group  is  illustrated 
in  Table  1 1 .4. 
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We  can  write  the  individual  means  of  the  groups  in  the  form 


(11.2.1) 


Table  11.4:  One-way  classification. 


Measurement 

Classification  A 

number 

A  i 

A2 

Ai  . 

Aj 

1 

*11 

*22 

*/ 1 

*71 

2 

• 

*12 

*22 

*12 

*72 

• 

• 

J 

• 

• 

*1 J 

X2j 

xij 

*Ij 

• 

J 

*17 

*27 

*/7 

XIJ 

Here  a  point  denotes  summation  over  the  index  that  it  replaces.  This  notation 
allows  a  simple  generalization  to  a  larger  number  of  indices.  The  analysis 
of  variance  with  one-way  classification  is  based  on  the  assumption  that  the 
measurements  within  a  group  only  differ  by  the  measurement  errors,  which 
follow  a  normal  distribution  with  a  mean  of  zero  and  variance  er2.  That  is,  we 
consider  the  model 


-V  j  —  di  T  £/  / 


ij 


(11.2.2) 


The  goal  of  an  analysis  of  variance  was  to  test  the  hypothesis 


Hq  (/X 1  =  /Z 2  =  . . .  =  /X /  =  /z). 


(11.2.3) 


By  choosing  measurements  out  of  a  certain  group  i  and  by  applying  the 
maximum-likelihood  method  to  (1 1.2.2)  one  obtains  the  estimator 


(11.2.4) 
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If  Hq  is  true,  then  one  has 


U  =  x  =  JjYlYlxij  =  ■  (11.2.5) 

i  j  i 

The  (composite)  alternative  hypothesis  is  that  not  all  of  the  //,-  are  equal. 
We  want,  however,  to  retain  the  concept  of  the  overall  mean  and  we  write 


fii  =  //  +  ai 

The  model  (1 1.2.2)  then  has  the  form 


Xj  j  —  jl  +  CL  i  +  £;  j . 


ij 


(11.2.6) 


Between  the  quantities  a / ,  which  represent  a  measure  of  the  deviation  of  the 
mean  for  the  i  th  group  from  the  overall  mean,  one  has  the  relation 

J2ai= 0  •  (11.2.7) 

i 

The  maximum-likelihood  estimators  for  the  cij  are 


(11.2.8) 


The  one-way  analysis  of  variance  of  Sect.  11.1  was  derived  from  the  identity 


X)  +  (Xij 


(11.2.9) 


which  describes  the  deviation  of  the  individual  measurements  from  the  overall 
mean.  The  sum  of  squares  Q  of  these  deviations  could  then  be  decomposed 
into  the  terms  Qa  and  Qw',  cf.  (11.1.3). 

After  this  preparation  we  now  consider  a  two-way  classification,  where 
the  measurements  are  divided  into  groups  according  to  two  criteria,  A  and  B . 
The  measurements  belong  to  class  A,-,  which  is  given  by  the  classification 
according  to  A,  and  also  to  class  Bj.  The  index  k  denotes  the  measurement 
number  within  the  group  that  belongs  to  both  class  A/  and  class  Bj. 

A  two-way  classification  is  said  to  be  crossed,  when  a  certain  classifica¬ 
tion  Bj  has  the  same  meaning  for  all  classes  A.  If,  for  example,  microscopes 
are  classified  by  A  and  observers  by  B,  and  if  each  observer  carries  out  a  mea¬ 
surement  with  each  microscope,  then  the  classifications  are  crossed.  If,  how¬ 
ever,  one  compares  the  microscopes  in  different  laboratories,  and  if  therefore 
in  each  laboratory  a  different  group  of  J  observers  makes  measurements  with 
a  certain  microscope  i,  then  the  classification  B  is  said  to  be  nested  in  A.  The 
index  j  then  merely  counts  the  classes  B  within  a  certain  class  A. 
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The  simplest  case  is  a  crossed  classification  with  only  one  observation. 
Since  then  k  =  1  for  all  observations  xijk,  we  can  drop  the  index  k.  One  uses 
the  model 


Xij  =  ti  +  ai+bj  +£ij  ,  2_,  ai  =  0  ,  J>  =  0  ,  (11.2.10) 

i  j 

where  e  is  normally  distributed  with  mean  zero  and  variance  a2. 
The  null  hypothesis  says  that  by  classification  according  to  A  or  B,  no  devia¬ 
tion  from  the  overall  mean  occurs.  We  write  this  in  the  form  of  two  individual 
hypotheses, 

H{QA\ax=a2  =  •••  =  */=  0)  ,  H^B\bl=b2  =  ---  =  bj=0 )  .  (11.2.11) 


The  least- squares  estimators  for  a,  and  bj  are 


&i  —  Xf  X  , 


bj  =  x.j  =  x 


In  analogy  to  Eq.  (1 1.2.9)  we  can  write 


Xij  -x  —  (xt,  -  x)  +  C*  j  -  x)  +  (Xj  j  —Xi'—Xj+x) 


■J 


IJ 


■J 


(11.2.12) 


In  a  similar  way  the  sum  of  squares  can  be  written 


where 


(xn  —  x)2  —  Q  —  Qa  +  Qb  +  Q 


ij 


w 


i  j 


Qa  =  jYtxj-x)2  =  jYxI-IJx2 


(11.2.13) 


Qb  =  I^ix.j  -x)2  =  I^x2  -IJx2 


Qw  — 


j 


j 

C Xij  -Xi'-Xj+X)2 


l  ] 


=  J~xl 


l  J 


J 


(11.2.14) 


When  divided  by  the  corresponding  number  of  degrees  of  freedom,  these 
sums  are  estimators  of  a2,  providing  that  the  hypotheses  (11.2.11)  are  cor¬ 
rect.  The  hypotheses  H{{A)  and  H(\B)  can  be  tested  individually  by  using  the 
ratio 


f<b,=4/4 


(11.2.15) 
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Here  one  uses  one-sided  F-tests  as  in  Sect.  11.1.  The  overall  situation  can  be 
summarized  in  an  ANOVA  table  (Table  1 1.5). 

If  more  than  one  observation  is  made  in  each  group,  then  the  crossed 
classification  can  be  generalized  in  various  ways.  The  most  important  gener¬ 
alization  involves  interaction  between  the  classes.  One  then  has  the  model 


xi jk  —  A  +  ai  -\-bj-\-  ( ab)i j  +  Sj jk 


(11.2.16) 


The  quantity  ( ab)ij  is  called  the  interaction  between  the  classes  A;  and  Bj. 
It  describes  the  deviation  from  the  group  mean  that  occurs  because  of  the 
specific  interaction  of  A,  and  Bj.  The  parameters  a,-,  bj,  (ab)jj  are  related  by 


J2ai  =  J2bJ  =  EE(fl^  =  0 


(11.2.17) 


J 


l  J 


Their  maximum- likelihood  estimators  are 


ai=Xi_,-x  ,  bj  =  Xi.  —  x  , 
(ab)jj  =  Xij.  +  x- xL.  - x.j.  . 


(11.2.18) 


Table  11.5:  Analysis  of  variance  table  for  crossed  two-way  classification  with  only  one  ob¬ 
servation. 


Source 

ss 

DF 

MS 

Class.  A 

Qa 

7-1 

s2  - 

Class.  B 

Qb 

7-1 

c2  - 

aB  — 

Within 

groups 

Qw 

(7  —  1)  (7  —  1) 

s2  - 

Sum 

Q 

77-1 

s2  = 

Q 


7-1 


Q 


B 


7-1 


Q 


w 


(7-l)(7-l) 


Q 


IJ-I 


The  null  hypothesis  can  be  divided  into  three  individual  hypotheses, 

H^\ai=0-,i  =  l,2,...,I)  ,  HqB^ (bj  =0;  j  =  1 , 2, . . . ,  7)  , 

H{QAB\(ab)ij  =  0;  i  =  1, 2, . . . ,  /;  j  =  1, 2, . . . ,  7)  , 

(11.2.19) 

which  can  then  be  tested  individually.  The  analysis  of  variance  is  based  on  the 
identity 


xijk 


—  X 


(Xi..-x)  +  (x.j.  ■ 
+  (Xij.  +x  —  Xi.. 


X) 

■  x.j.)  T  (xj jk 


(11.2.20) 
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which  allows  the  decomposition  of  the  sum  of  squares  of  deviations  into  four 
terms, 

q  =  J2J2J2(Xijk-*)2 

i  j  k 

—  Qa  +  Qb  +  Qab  +  Qw  ,  (11.2.21) 

Qa  =  JK  J2(xi.-x)2  , 

i 

Qb  =  IK^(xj'-x)2  , 

j 

Qab  =  K^2^(xij,+x-xi„-xj,)2  , 

i  j 

Qw  =  y.  — Xjj) 

i  j  k 

The  degrees  of  freedom  and  mean  squares  as  well  as  the  F -ratio,  which  can 
be  used  for  testing  the  hypotheses,  are  given  in  Table  1 1.6. 


Table  11.6:  Analysis  of  variance  table  for  crossed  two-way  classification. 


Source 


SS 


DF 


MS 


Class.  A 

Qa 

7-1 

s 2  - 

Qa 

7-1 

s 2 

f(A)  _  A 

SW 

Class.  B 

Qb 

7-1 

s*  - 
SB  — 

Qb 

7-1 

s 2 

p(B)  _  SB 
sw 

Interaction 

Qab 

(/ -1)(7-1) 

v2  - 

Qab 

s 2 

p(AB)  _  A AB 

SAB  ~ 

~  (/ -1X7-1) 

^  2 

SW 

Within 

groups 

Qw 

IJ(K-l) 

v2  - 

Qw 

IJ(K  -1) 

Sum 

Q 

IJK-l 

V2  - 

Q 

o  — 

IJK-l 

Finally,  we  will  give  the  simplest  case  of  a  nested  two-way  classification. 
Because  the  classification  B  is  only  defined  within  the  individual  classes  of  A, 
the  terms  bj  and  ( ab)ij  from  Eq.  (1 1.2.10)  are  not  defined,  since  they  imply  a 
sum  over  i  for  fixed  j .  Therefore  one  uses  the  model 


Xijk  —  A  +  Oj  +  bj  j  +  £j  jk 


E,  <*i=o  ,  E;  Ey  ba  =  0 


bjj  = 


xij. 


with 


(11.2.22) 
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The  term  bjj  is  a  measure  of  the  deviation  of  the  measurements  of  class 
Bj  within  class  A,  from  the  overall  mean  of  class  A,.  The  null  hypothesis 
consists  of 


HnA\ai  =0;  i  —  1,2,...,/) 


°  (11-2.23) 

Ho  (bij  =  0;  /  =  1, 2, y  =  1, 2, ....  7)  . 

An  analysis  of  variance  for  testing  these  hypotheses  can  be  carried  out  with 
the  help  of  Table  1 1.7.  Here  one  has 

Qa  =  JK^(xi„-x)2  , 

i 

qb(a)  —  ^  yy  Xj.. )  , 

i  j 

Qw  =  ~  *jj.)  7 

i  j  k 


-\2 


Q  =  Qa  +  Qb(a)  +  =/,/,  /  ^xjjk—x)' 

i  j  k 

In  a  similar  way  one  can  construct  various  models  for  two-way  or  multiple 
classification.  For  each  model  the  total  sum  of  squares  is  decomposed  into  a 
certain  sum  of  individual  sums  of  squares,  which,  when  divided  by  the  corre¬ 
sponding  number  of  degrees  of  freedom,  can  be  used  to  carry  out  an  F-test. 
With  this,  the  hypotheses  implied  by  the  model  can  be  tested. 

Some  models  are,  at  least  formally,  contained  within  others.  For  example 
one  finds  by  comparing  Tables  1 1.6  and  1 1.7  the  relation 

Qb(A)  —  Qb  +  Qab  ■  (11.2.24) 

A  similar  relation  holds  for  the  corresponding  number  of  degrees  of  freedom, 


/s(A)  —  Zb  +  /ab  ■ 


(11.2.25) 


Table  11.7:  Analysis  of  variance  table  for  nested  two-way  classification. 


Source 

ss 

DF 

MS 

F 

Class.  A 

Within  A 

Within 

groups 

Qa 

Qb(A ) 

Qw 

7-1 

7(7-1) 

I  J(K  —  1) 

2  Qa 

A  7-1 

„2  _  AB(A ) 

f(A)  — 

s2 

v2 

SB(A ) 

B(A)  7(7-1) 

2  Qw 

cy^  - 

w  “  I  J(K  —  1) 

Sum 

Q 

IJK-  1 

2  Q 

~  IJK  —  1 
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Example  11.2:  Two-way  analysis  of  variance  in  cancer  research 

Two  groups  of  rats  are  injected  with  the  amino  acid  thymidine  containing 
traces  of  tritium,  a  radioactive  isotope  of  hydrogen.  In  addition,  one  of  the 
groups  receives  a  certain  carcinogen.  The  incorporation  of  thymidine  into  the 
skin  of  the  rats  is  investigated  as  a  function  of  time  by  measuring  the  number 
of  tritium  decays  per  unit  area  of  skin.  The  classifications  are  crossed  since  the 
time  dependence  is  controlled  in  the  same  way  for  both  series  of  test  animals. 
The  measurements  are  compiled  in  Table  11.8.  The  numbers  are  already  trans¬ 
formed  from  the  original  counting  rates  x'  according  to  x  =  50  log  .C  —  100. 
The  results,  obtained  with  the  class  AnalysisOfV ariance,  are  shown  in 
Table  1 1.9.  There  is  no  doubt  that  the  presence  or  absence  of  the  carcinogen 
(classification  A)has  an  influence  on  the  result,  since  the  ratio  F ( A }  is  very 
large.  We  now  want  to  test  the  existence  of  a  time  dependence  (classification 
B)  and  of  an  interaction  between  A  and  B  at  a  significance  level  of  a  =  0.01. 
From  Table  1.8  we  find  Fo  99  =  2.72.  The  hypotheses  of  time  independence 
and  vanishing  interaction  must  therefore  be  rejected.  Table  1 1.9  also  contains 
the  values  of  a  for  which  the  hypothesis  would  not  need  to  be  rejected.  They 
are  very  small.  ■ 


Table  11.8:  Data  for  Example  1 1.2. 


Obs. 

Injection 

Time  after  injection  (h) 

no. 

4 

8 

12 

16 

20 

24 

28 

32 

36 

48 

i 

34 

54 

44 

51 

62 

61 

59 

66 

52 

52 

2 

Thymidine 

40 

57 

52 

46 

61 

70 

67 

59 

63 

50 

3 

38 

40 

53 

51 

54 

64 

58 

67 

60 

44 

4 

36 

43 

51 

49 

60 

68 

66 

58 

59 

52 

1 

28 

23 

42 

43 

31 

32 

25 

24 

26 

26 

2 

Thymidine 

32 

23 

41 

48 

45 

38 

27 

26 

31 

27 

3 

and 

34 

29 

34 

36 

41 

32 

27 

32 

25 

27 

4 

Carcinogen 

27 

30 

39 

43 

37 

34 

28 

30 

26 

30 
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Table  11.9:  Printout  from  Example  1 1.2. 
Analysis  of  variance  table 


Source 

Sum  of 

Degrees  of 

Mean 

F  Ratio 

Alpha 

squares 

freedom 

square 

A 

9945.80 

1 

9945.80 

590.547  25 

0.00E-10 

B 

1917.50 

9 

213.06 

12.65050 

0.54E-10 

INT. 

2234.95 

9 

248.33 

14.744  85 

0.03E-10 

W 

1010.50 

60 

16.84 

TTL. 

15  108.75 

79 

191.25 

11.3  Java  Class  and  Example  Programs 

Java  Class 

AnalysisOfV ariance  performs  a  crossed  as  well  as  a  nested  two-way 
analysis  of  variance. 


Example  Program  11.1:  The  class  ElAnova  demonstrates  the  use  of 
AnalysisOfV  ariance 

The  short  program  analyses  the  data  of  Example  11.2.  Data  and  output  are  presented 
as  in  Table  11.9. 

Example  Program  11.2:  The  class  E2Anova  simulates  data  and  performs 
an  analysis  of  variance  on  them 

The  program  allows  interactive  input  of  numerical  values  for  a,  the  quantities  /,  7,  K 
and  three  further  parameters:  At,  Aj,  Ak.  It  generates  data  of  the  simple  form 


%ijk  —  i  Af  T  j  Aj  T  kAfc  T 


Here  the  quantities  are  taken  from  a  normal  distribution  with  zero  mean  standard 
deviation  a .  An  analysis  of  variance  is  performed  on  the  data  and  the  results  are 
presented  for  crossed  and  for  nested  two-way  classification. 

Suggestion:  Take  a  value  /  0  for  only  one  of  the  parameters  At,  A j,  A k  and 
interpret  the  resulting  analysis  of  variance  table. 


12.  Linear  and  Polynomial  Regression 


The  fitting  of  a  linear  function  (or,  more  generally,  of  a  polynomial)  to 
measured  data  that  depend  on  a  controlled  variable  is  probably  the  most  com¬ 
monly  occurring  task  in  data  analysis.  This  procedure  is  also  referred  to  as 
linear  (or  polynomial )  regression.  Although  we  have  already  treated  this  prob¬ 
lem  in  Sect.  9.4.1,  we  take  it  up  again  here  in  greater  detail.  Here  we  will 
use  different  numerical  methods,  emphasize  the  most  appropriate  choice  for 
the  order  of  the  polynomial,  treat  in  detail  the  question  of  confidence  limits, 
and  also  give  a  procedure  for  the  case  where  the  measurement  errors  are  not 
known. 

12.1  Orthogonal  Polynomials 

In  Sect.  9.4.1  we  have  already  fitted  a  polynomial  of  order  r  —  1, 


t](t)  =  Xi+X2t  -\ - Yxrtr  1 


(12.1.1) 


to  measured  values  y,(t(),  which  corresponded  to  a  given  value  tt  of  the  con¬ 
trolled  variable  t.  Here  the  quantities  i){t  )  were  the  true  values  of  the  mea¬ 
sured  quantities  y(t).  In  Example  9.2  we  saw  that  there  can  be  a  maximum 
reasonable  order  for  the  polynomial,  beyond  which  a  further  increase  in  the 
order  gives  no  significant  improvement  in  the  fit.  We  want  here  to  pursue  the 
question  of  how  to  find  the  optimal  order  of  the  polynomial.  Example  9.2 
shows  that  when  increasing  the  order,  all  of  the  coefficients  x\,  X2, ...  change 
and  that  all  of  the  coefficients  are  in  general  correlated.  Because  of  this  the 
situation  becomes  difficult  to  judge.  These  difficulties  are  avoided  by  using 
orthogonal  polynomials. 

Instead  of  (12.1.1)  one  describes  the  data  by  the  expression 


rj  (0  =x\fi(t)  +  x2f2(t)-\ - Yxrfr(t) 


(12.1.2) 
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Here  the  quantities  fj  are  polynomials  of  order  j  —  1 , 


fj (o  —  ZbMk 


k=i 


(12.1.3) 


These  are  chosen  to  satisfy  the  condition  of  orthogonality  (or  more  precisely 
condition  of  orthonormality)  with  respect  to  the  values  tj  and  the  measurement 
weights  gj  =  1  /of, 

N 

=  •  (12.1.4) 

i= 1 


Defining 


/  M\  Mi  •••  Mr  ^ 

Mj  =  fj(ti ),  A  =  : 

\  Afifi  An 2  ■■■  Aj\fr  J 


(12.1.5) 


and  using  the  matrix  notation  (9.2.9) 


Gy  = 

*  8 1 

g2 

0  > 

• 

• 

• 

{  0 

8n  / 

(12.1.6) 


Eq.  (12.1.4)  has  the  simple  form 


ATGyA  =  I 


(12.1.7) 


The  least-squares  requirement, 


Z'?1  \ yi(M~Zxjfi(ti)  ’ 

'='  l  v=i 

—  (y  —  Ax)TGy  (y  —  Ax)  =  min  ,  (12.1.8) 


is  of  the  form  (9.2.19)  and  thus  has  the  solution  (9.2.26) 

x  =  -ATGyy  ,  (12.1.9) 

where  we  have  used  (9.2.18)  and  (12.1.7).  Because  of  (9.2.27)  and  (12.1.7) 
one  has 

Cy=I  ,  (12.1.10) 
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i.e.,  the  covariance  matrix  of  the  coefficients  xi,  xr  is  simply  the  unit 
matrix.  In  particular,  the  coefficients  are  uncorrelated. 

We  now  discuss  the  procedure  for  determining  the  matrix  elements 

j 

•  (12.1.11) 

k= t 

For  j  =  1 ,  the  orthogonality  condition  gives 


J^gib 2n  =  l,  bn  =  1/  /£*  •  (12.1.12) 

For  j  =  2  there  are  two  orthogonality  conditions.  First  we  obtain 

^gihifi)f\iti)  =  ^giibn  +bi2ti)bn  =  0 

i  i 


and  from  this 

bix  =  -b1J^^  =  -b11t  ,  (12.1.13) 

2^gi 

where 

t  =  T.giti/T.gi  (12.1.14) 

is  the  weighted  mean  of  the  values  of  the  controlled  variable  tj  at  which  the 
measurements  were  made.  The  second  orthogonality  condition  for  j  =  2  gives 

Y.giUi(ti)f  -  Egi(bn+b22ti)2  =  Y.  Sib  22^1  ~  02  =  1 
or 

b22=l/^Ygiiti-t)2  .  (12.1.15) 

Substitution  into  (12.1.13)  gives  bi\- 

For  j  >  2  one  can  obtain  the  values  of  A  7  /-  =  f  j  ( tj )  recursively  from  the 
quantities  for  j  —  l  and  j  —  2.  We  make  the  ansatz 


Yfjiti )  =  (t,  -  a)  f  j-\  (tj)  -  fifj-2(t,) 
multiply  by  gi  fj  _  i  (ft ) ,  sum  over  i ,  and  obtain 


(12.1.16) 


yYgifj(ti)fj- i(ti)  =0 
=  Ygib  Uj-\ (fi)]2  -  aYgi  Uj-l  (h  )]2 
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Because  of  the  orthogonality  condition,  the  second  term  on  the  right-hand  side 
is  equal  to  unity,  and  the  third  term  vanishes.  Thus  one  has 

<x  =  Y,8itiUj-i(ti)]2  ■  (12.1.17) 

By  multiplication  of  (12.1.16)  by  gifj-2(ti )  and  summing  one  obtains  in  the 
same  way 

P  =  T,8itifj-mfj-2(t)  .  (12.1.18) 

Finally  by  computing  the  expression 

'Egiff(ti)  =  1 

and  substituting  f  j(  t,)  from  (12.1.16)  one  obtains 

y2  =  Esditi-^fj-m-Pfj^iti)]2  .  (12.1.19) 

Once  the  quantities  a,  (3,  y  have  been  computed  for  a  given  j ,  then  the  coef¬ 
ficients  bjk  are  determined  to  be 


bj i  =  (~abj-iti  -pbj-2,\)/y  , 

bjk  =  -abj-i'k-  pbj-2,k)/y  ,  k  =  2,...,j -2 

bjJ- 1  =  (bj-ij-2-abj-ij-i)/Y  , 


bj  j  -  bj-ij-i/y  .  (12.1.20) 

With  (12.1.3)  and  (12.1.5)  one  thus  obtains  the  quantities  A =  f  j(ti). 
Since  the  procedure  is  recursive,  the  column  j  of  the  matrix  A  is  obtained 
from  the  elements  of  the  columns  to  its  left.  Be  extending  the  matrix  to  the 
right,  the  original  matrix  elements  are  not  changed.  This  has  the  consequence 
that  by  increasing  the  number  of  terms  in  the  polynomial  (12.1.1)  from  r  to 
r',  the  coefficients  x\, . . .,  xr  are  retained,  and  one  merely  obtains  additional 


coefficients.  All  of  the  xj  are  uncorrelated  and  have  the  standard  deviation 
err .  =  1 .  The  order  of  the  polynomial  sufficient  to  describe  the  data  can  now 
be  determined  by  the  requirement 


\xj\<c,  j  >  r  . 

With  this  the  contribution  of  all  higher  coefficients  xj  is  smaller  than  eery  . . 
The  simplest  choice  of  c  is  clearly  c  —  1 . 

For  a  given  r  one  can  now  substitute  the  values  bjk  into  (12.1 .3)  and  fj  (t, ) 
into  ( 1 2. 1 .2)  in  order  to  obtain  the  best  estimates  77)  (y )  of  the  true  values  (f,- ) 
corresponding  to  the  measurements  >7  (y ) .  One  can  also  compute  the  quantity 

-  ~of 

1  =  1  1 

If  the  data  are  in  fact  described  by  the  polynomial  of  the  chosen  order,  then 
the  quantity  M  follows  a  /  2-distribution  with  /  =  N  —  r  degrees  of  freedom. 
It  can  be  used  for  a  x  2-test  for  the  goodness-of-fit  of  the  polynomial. 
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Example  12.1:  Treatment  of  Example  9.2  with  Orthogonal  Polynomials 

Application  of  polynomial  regression  to  the  data  of  Example  9.2  yields  the 
results  shown  in  Table  12.1.  The  numerical  values  of  the  minimization  func¬ 
tion  are,  of  course,  exactly  the  same  as  in  Example  9.2.  The  fitted  poly¬ 
nomials  are  also  the  same.  The  numerical  values  x\ ,  X2,  ...  are  different 
from  those  in  Example  9.2  since  these  quantities  are  now  defined  differently. 
We  have  emphasized  that  the  covariance  matrix  for  x  is  the  r  x  r  unit  matrix. 
We  see  that  the  quantities  x 6,  x-j,  . . .,  x  i o  have  magnitudes  less  than  unity  and 
thus  are  no  longer  significantly  different  from  zero.  It  is  more  difficult  to  judge 
the  significance  of  x$  —  1 .08.  In  many  cases  one  would  not  consider  this  value 
as  being  clearly  different  from  zero.  This  means  that  a  third  order  polynomial 
(i.e.,  r  =  4)  is  sufficient  to  describe  the  data.  ■ 


Table  12.1:  Results  of  the  application  of  polynomial  regression  using  the  data  from 
Example  9.2. 


r 

Xr 

M 

Degrees  of  freedom 

i 

24.05 

833.55 

9 

2 

15.75 

585.45 

8 

3 

23.43 

36.41 

7 

4 

5.79 

2.85 

6 

5 

1.08 

1.69 

5 

6 

0.15 

1.66 

4 

7 

0.85 

0.94 

3 

8 

-0.41 

0.77 

2 

9 

-0.45 

0.57 

1 

10 

0.75 

0.00 

0 

12.2  Regression  Curve:  Confidence  Interval 

Once  the  order  of  the  polynomial  (12.1.2)  in  one  way  or  another  has  been 
fixed,  then  every  point  of  the  regression  polynomial  or  -  in  reference  to  the 
graphical  representation  -  the  regression  curve  can  be  computed.  This  is  done 
by  first  substituting  the  recursively  computed  values  bjk  into  (12.1.3)  for  a 
given  t,  and  using  the  thus  computed  fj(t)  and  the  parameters  x  in  (12.1.2): 

rj(t)  =  ^Xj  I  ^bjktk~l  ]  =  dT (t )x 


(12.2.1) 
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Here  d  is  an  r  -vector  with  the  elements 

j 

dj(t)  =  ^2bJktk~l  .  (12.2.2) 

k=  1 


After  error  propagation,  (3.8.4),  the  variance  of  rf(t )  is 


<x~2w=dT(0d(0  , 


(12.2.3) 


since  Cj  —  /  ■  The  reduced  variable 


u  — 


nit) -nit) 

Orjit) 


(12.2.4) 


thus  follows  a  standard  Gaussian  distribution.  This  also  means  that  the  prob¬ 
ability  for  the  position  of  the  true  value  r/(t)  is  distributed  according  to  a 
Gaussian  of  width  a^(t)  centered  about  the  value  rj(t  ). 

We  can  now  easily  give  a  confidence  interval  for  r/(t).  According  to 
Sect.  5.8  one  has  with  probability  P  that 


u\  >Q'iP)  =  Q[  \ip  +  1) 


(12.2.5) 


e.g.,  L?'(0.95)  =  1.96.  Thus  the  confidence  limits  at  the  confidence  level  P 
(e.g.,  P  —  0.95)  are 

nit)  =  rj(t)  ±  f2'(P)orj(t)  =  rfit)  ±  8rj(t)  .  (12.2.6) 


12.3  Regression  with  Unknown  Errors 


If  the  measurement  errors  <r;  =  1  /  ^fgi  are  not  known,  but  it  can  be  assumed 
that  they  are  equal, 

Oi—o,  /  =  !,...,  N  ,  (12.3.1) 


then  one  can  easily  obtain  an  estimate  5  for  o  directly  from  the  regression 
itself.  The  quantity 


in  iti)  —  yiin))2 

2^  <j2 

i  —  \ 


(12.3.2) 


follows  a  x2-distribution  with  /  =  N  —  r  degrees  of  freedom.  More  precisely, 
if  many  similar  experiments  with  N  measurements  each  were  to  be  carried 
out,  then  the  values  of  M  computed  for  each  experiment  would  be  distributed 
like  a  y 2 -variable  with  /  =  N  —r  degrees  of  freedom.  Its  expectation  value 


12.3  Regression  with  Unknown  Errors 


327 


is  simply  /  =  iV  —  r.  If  the  value  of  a  in  (12.3.2)  is  not  known,  then  it  can 
be  replaced  by  an  estimate  s  such  that  on  the  right-hand  side  one  has  the 
expectation  value  of  M, 

s2  =  .  (12.3.3) 


Here  all  of  the  steps  necessary  to  compute  rj(ti)  must  be  carried  out  using 
the  formulas  of  the  last  section  and  with  the  value  cr(-  =  1.  (In  fact,  for  the 
case  (12.3.1)  the  quantities  cr,  and  gj  could  be  removed  completely  from  the 
formulas  for  computing  x  and  rjit).) 

If  we  define 

s~(t)  =  dT(0d(0 

as  the  value  of  the  variance  of  rj(t  )  obtained  for  cr,-  =  1,  then 


2  2-2 
sm  = s  sm 


(12.3.4) 


is  clearly  the  estimate  of  the  variance  of  rj(t),  which  is  based  on  the  esti¬ 
mate  (12.3.3)  for  the  variance  of  the  measured  quantities.  Replacing  er^(f)  by 
Srj(t)  in  (12.2.4),  we  obtain  the  variable 


V  —  - 

sn(t ) 


(12.3.5) 


which  no  longer  follows  a  standard  normal  distribution  but  rather  Student’s 
/-distribution  for  /  =  N  —r  degrees  of  freedom  (cf.  Sect.  8.3).  For  the  confi¬ 
dence  limits  at  the  confidence  level  P  —  1  —  a  one  now  has 


V (0  =  rj(t)  ±  h-a/2Srj(t)  =  v(t)  ±  8r}(t)  ,  (12.3.6) 

where  t\-a/ 2  is  the  quantile  of  the  /-distribution  for  /  =  N  —  r  degrees  of 
freedom. 

It  must  be  emphasized  at  this  point  that  in  the  case  where  the  errors  are 
not  known,  one  loses  the  possibility  of  applying  the  /--test.  The  goodness- 
of-fit  for  such  a  polynomial  cannot  be  tested.  Thus  the  procedure  described  at 
the  end  of  Sect.  12.1  for  determining  the  order  of  the  polynomial  is  no  longer 
valid.  One  must  rely  on  a  priori  knowledge  about  the  order  of  the  polynomial. 
Therefore  one  almost  always  refrains  from  treating  anything  beyond  linear 
dependences  between  ij  and  t,  i.e.,  the  case  r  =  2. 

Example  12.2:  Confidence  limits  for  linear  regression 

In  the  upper  plot  of  Fig.  12.1,  four  measured  points  with  their  errors,  the  cor¬ 
responding  regression  line,  and  the  limits  at  a  confidence  level  of  95  %  are 
shown.  The  measured  points  are  taken  from  the  example  of  Sect.  9.3.  The  con¬ 
fidence  limits  clearly  become  wider  the  more  one  leaves  the  region  of  the 


328 


12  Linear  and  Polynomial  Regression 


Fig.  12.1:  (Above)  Measurements  with  errors  and  regression  line  with  95%  confidence  lim¬ 
its.  (Below)  Measured  points  with  errors  assumed  to  be  of  unknown  but  equal  magnitude, 
regression  line,  and  95  %  confidence  limits. 


measured  points.  The  narrowest  region  is  close  to  the  measured  point  with  the 
smallest  error.  The  plot  is  similar  to  Fig.  9. 2d.  It  is  easy  to  convince  oneself 
that  the  envelope  of  the  lines  of  Fig.  9.2d  corresponds  to  the  68.3  %  confidence 
limit.  In  the  lower  plot  of  Fig.  12.1  the  same  measured  points  were  used,  but 
the  errors  were  treated  as  equal  but  of  unknown  magnitude.  The  regression 
line  and  95  %  confidence  limits  are  shown  for  this  assumption.  ■ 
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12.4  Java  Class  and  Example  Programs 

Java  Class 

Regression  performs  a  polynomial  regression. 


Example  Program  12.1:  The  class  ElReg  demonstrates  the  use  of 
Regression 

The  short  program  contains  the  data  of  Examples  9.2  and  12.1.  As  measurement 
errors  the  statistical  errors  Ayt  —  are  taken.  A  polynomial  regression  with  r  =  10 
parameters  is  performed.  The  results  are  presented  numerically  (see  also  Table  12.1). 

Example  Program  12.2:  The  class  E2Reg  demonstrates  the  use  of 
Regression  and  presents  the  results  in  graphical  form 

The  same  problem  as  in  ElReg  is  treated.  An  additional  input  variable  is  r  max,  the 
maximum  number  of  terms  in  the  polynomials  to  be  presented  graphically.  Graphs  of 
these  polynomials  are  shown  together  with  the  data  points. 

Suggestion:  Choose  consecutively  rmax  =  2,  3, 4, 6, 10.  Try  to  explain  why  the 
curves  for  rmax  =  3  or  4  seem  to  be  particularly  convincing  although  for  rmax  =  10  all 
data  points  lie  exactly  on  the  graph  of  the  polynomial. 

Example  Program  12.3:  The  class  E3Reg  demonstrates  the  use  of 
Regression  and  graphically  represents  the  regression  line  and  its 
confidence  limits 

This  program  again  treats  the  same  problem  as  ElReg.  Additional  input  parameters 
are  r,  the  order  of  the  polynomial  to  be  fitted,  and  the  probability  P,  which  determines 
the  confidence  limits.  Presented  in  a  plot  are  the  data  points  with  their  errors,  the 
regression  line,  and  -  in  a  different  color  -  its  confidence  limits. 

Example  Program  12.4:  The  class  E4Reg  demonstrates  the  linear 
regression  with  known  and  with  unknown  errors 

The  program  uses  the  data  of  Example  12.2.  The  user  can  declare  the  errors  to  be 
either  known  or  unknown  and  chooses  a  probability  P  which  determines  the  con¬ 
fidence  limits.  The  graphics  -  corresponding  to  Fig.  12.1  -  contains  data  points, 
regression  line,  and  confidence  limits. 


13.  Time  Series  Analysis 


13.1  Time  Series:  Trend 

In  the  previous  chapter  we  considered  the  dependence  of  a  random  variable  y 
on  a  controlled  variable  t.  As  in  that  case  we  will  assume  here  that  y  consists 
of  two  parts,  the  true  value  of  the  measured  quantity  i]  and  a  measurement 
error  e, 

yi  =  Vi  +  £i  .  i  =  \,2,...,n  .  (13.1.1) 

In  Chap.  12  we  assumed  that  was  a  polynomial  in  t.  The  measurement  error 
Si  was  considered  to  be  normally  distributed  about  zero. 

We  now  want  to  make  less  restrictive  assumptions  about  r/.  In  this  chap¬ 
ter  we  call  the  controlled  variable  t  “time”,  although  in  many  applications  it 
could  be  something  different.  The  method  we  want  to  discuss  is  called  time 
series  analysis  and  is  often  applied  in  economic  problems.  It  can  always  be 
used  where  one  has  little  or  no  knowledge  about  the  functional  relationship 
between  r/  and  t.  In  considering  time  series  problems  it  is  common  to  observe 
the  Vi  at  equally  spaced  points  in  time, 


ti  —  ti-\  —  At  —  const  ,  (13.1.2) 

since  this  leads  to  a  significant  simplification  of  the  formulas. 

An  example  of  a  time  series  is  shown  in  Fig.  13.1.  If  we  look  first  only  at 
the  measured  points,  we  notice  strong  fluctuations  from  point  to  point.  Nev¬ 
ertheless  they  clearly  follow  a  certain  trend.  In  the  left  half  of  the  plot  they 
are  mostly  positive,  and  in  the  right  half,  mostly  negative.  One  could  qualita¬ 
tively  obtain  the  average  time  dependence  by  drawing  a  smooth  curve  by  hand 
through  the  points.  Since,  however,  such  curves  are  not  free  from  personal  in¬ 
fluences,  and  are  thus  not  reproducible,  we  must  try  to  develop  an  objective 
method. 
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Fig.  13.1  :  Data  points  ( circles )  and  moving  average  (joined  by  line  segments). 


We  use  the  notation  from  (13.1.1)  and  call  >/,-  the  trend  and  £/  the  random 
component  of  the  measurement  y, .  In  order  to  obtain  a  smoother  function  of 
t,  one  can,  for  example,  construct  for  every  value  of  y,-  the  expression 


1 

2k  +  l 


2> 


(13.1.3) 


i.e.,  the  unweighted  mean  of  the  measurements  for  the  times 


n—kt  n—k+1  >  •••>  n  —  1,  H,  t/+| , 

The  expression  (13.1.3)  is  called  a  moving  average  of  y. 


13.2  Moving  Averages 

Of  course  the  moving  average  (13.1.3)  is  a  very  simple  construction.  We  will 
show  later  (in  Example  13.1)  that  the  use  of  a  moving  average  of  this  form  is 
equivalent  to  the  assumption  that  r/  is  a  linear  function  of  time  in  the  interval 
considered, 


Vj  —  a  +  fitj  , 


j  =  —k,  —k  +!,...,& 


(13.2.1) 
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Here  a  and  fi  are  constants.  They  can  be  estimated  from  the  data  by  linear 
regression. 

Instead  of  restricting  ourselves  to  the  linear  case,  we  will  assume  more 
generally  that  r)  can  be  a  polynomial  of  order  i . 

In  the  averaging  interval,  t  takes  on  the  values 

tj—ti+jAt  ,  j  =  —  k,  —  k+  1, . . . ,  k  .  (13.2.2) 

Because  r]  is  a  polynomial  in  t, 

V/ j  =  cii ci2tj a^tj ai+itj  ,  (13.2.3) 

it  is  also  a  polynomial  in  j , 

rjj  =  xi+x2j +x3j2 -{ - hxi+i /  ,  (13.2.4) 

since  (13.2.2)  describes  a  linear  transformation  between  tj  and  j,  i.e.,  it  is 
merely  a  change  of  scale.  We  now  want  to  obtain  the  coefficients  xi,  x2,  . . ., 
xi+\  from  the  data  by  fitting  with  least  squares.  This  task  has  already  been 
treated  in  Sect.  9.4. 1 .  We  assume  (in  the  absence  of  any  better  knowledge)  that 
all  of  the  measurements  are  of  the  same  accuracy.  Thus  the  matrix  Gy  —  a  I 
is  simply  a  multiple  of  the  unit  matrix  I .  According  to  (9.2.26),  the  vector  of 
coefficients  is  thus  given  by 

x  =  -(ATA)_1ATy  ,  (13.2.5) 


where  A  is  a  (2k  +1)  x  (f  +  1)  matrix, 


/  1  -k  (- k )2  ...  (- k )l  \ 

1  -k  + 1  (—k+ 1)2  ...  (-k+lY 

(  1  1  k 2  ...  k^  J 


(13.2.6) 


For  the  trend  ?7o  at  the  center  of  the  averaging  interval  (  /  =  0)  we  obtain 
from  (13.2.4)  the  estimate 

rj0  =  xi  .  (13.2.7) 


It  is  equal  to  the  first  coefficient  of  the  polynomial.  According  to  (13.2.5),  x\ 
is  obtained  by  multiplication  of  the  column  vector  of  measurements  y  on  the 
left  with  the  row  vector 


a  =  (— (AtA)-1  At)i  ,  (13.2.8) 

i.e.,  with  the  first  row  of  the  matrix  — (ATA)_1  AT.  We  obtain 

T]o  =  ay  =  a-ky-k+a-k+iy-k+i^ - hflo^oH - kakyk  .  (13.2.9) 
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Table  13.1:  Components  of  the  vector  a  for  computing  moving  averages. 
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. . . , 
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5 
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75 
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30 
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5 
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18 
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60 
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6 
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-135 
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390 

600 

677 

7 

46189 

2145  -2860  - 

-2937 

-165 

3755 

7500  10125 

11063 

This  is  a  linear  function  of  the  measurements  within  the  averaging  interval. 
Here  the  vector  a  does  not  depend  on  the  measurements,  but  rather  only  on  i 
and  k ,  i.e.,  on  the  order  of  the  polynomial  and  on  the  length  of  the  interval. 
Clearly  one  must  choose 

l  <  2k  +  1  , 

since  otherwise  there  would  not  remain  any  degrees  of  freedom  for  the  least- 
squares  fit.  The  components  of  a  for  small  values  of  i  and  k  can  be  obtained 
from  Table  13.1. 


13.2  Moving  Averages 
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Equation  (13.2.9)  describes  the  moving  average  corresponding  to  the 
assumed  polynomial  (13.2.4).  Once  the  vector  a  is  determined,  the  moving 
averages 

ui  -  m(i)  =  ay(i)  -  aiyi-k  +  a2yi-k+iJi - \-a2k+iyi+k  (13.2.10) 

can  easily  be  computed  for  each  value  of  i . 


Example  13.1:  Moving  average  with  linear  trend 
In  the  case  of  a  linear  trend  function, 

Vj  -  X\  +X2j  , 


the  matrix  A  becomes  simply 


One  then  has 


AtA 


(AtA)-!  = 


a  = 


A  —  — 


(  1  -k  \ 

1  -k+ 1 


1  k 


7 


1 


1  1 

— k  — k  4"  1 


1 

k 


1  - 


-k  \ 
k  + 1 


1  k 


7 


2k  + 1  0 

0  k(k+l)(2k+l)/3 


1 


2k  + 1 
0 


0 

3 


\ 


(— (AtA)_1At)i  = 


k(k+l)(2k+l)  / 

1 


2k  + 1 


(1,1,---, 1) 


In  this  case  the  moving  average  is  simply  the  unweighted  mean  (13.1.3). 


For  more  complicated  models  one  can  obtain  the  vectors  a  either  by 
solving  (13.2.8)  or  simply  from  Table  13.1.  Because  of  the  symmetry  of  A, 
one  can  show  that  polynomials  of  odd  order  (i.e.,  I  =  2n  with  n  an  integer) 
have  the  same  values  of  a  as  those  of  polynomials  of  the  next  lower  order 
£  =  2n  —  1 .  One  can  also  easily  show  that  a  has  the  symmetry 

aj—a-j  ,  7  =  1,2, ...,  k  .  (13.2.11) 
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13.3  Edge  Effects 

Of  course  the  moving  average  (13.2.10)  can  be  used  to  estimate  the  trend 
only  for  points  i  that  have  at  least  k  neighboring  points  both  to  the  right  and 
left,  since  the  averaging  interval  covers  2k  +  1  points.  This  means  that  for  the 
first  and  last  k  points  of  a  time  series  one  must  use  a  different  estimator.  One 
obtains  the  most  obvious  generalization  of  the  estimator  by  extrapolating  the 
polynomial  (13.2.4)  rather  than  using  it  only  at  the  center  of  an  interval.  One 
then  obtains  the  estimators 

n,  =  in 

m  =  Hi 


Here  the  notation  xlA+l  1  and  x(n~k)  indicates  that  the  coefficients  x  were  de¬ 
termined  for  the  first  and  last  intervals  of  the  time  series  for  which  the  centers 
are  at  ( k  +  1)  and  ( n  —  k). 

The  estimators  are  now  defined  even  for  i  <  1  and  i  >  n.  They  thus  offer 
the  possibility  to  continue  the  time  series  (e.g.,  into  the  future).  Such  extrapo¬ 
lations  must  be  treated  with  great  care  for  two  reasons: 

(i)  Usually  there  is  no  theoretical  justification  for  the  assumption  that  the 

trend  is  described  by  a  polynomial.  It  merely  simplifies  the  compu¬ 
tation  of  the  moving  average.  Without  a  theoretical  understanding  for  a 
trend  model,  the  meaning  of  extrapolations  is  quite  unclear. 

(ii)  Even  in  cases  where  the  trend  can  rightly  be  described  by  a  polynomial, 

the  confidence  limits  quickly  diverge  from  the  estimated  polynomial  in 
the  extrapolated  region.  The  extrapolation  becomes  very  inaccurate. 

Whether  point  (i)  is  correct  must  be  carefully  checked  in  each  individual  case. 
The  more  general  point  (ii)  is  already  familiar  from  the  linear  regression  (cf. 
Fig.  12.1).  We  will  investigate  this  in  detail  in  the  next  section. 


=  JC 


<£+l)  ,  ~(k+ 1) 


=  JC 


1 

+  X£+ 1 
( n—k ) 


+  X2 
(*+ 1) 


<*+l) 


(i-k-\)+x^L\i-k 
(i  —  k  —  l)k  ,  i  <k  , 


1)2  + 


•  •  • 


1 


+  x^  k\i  +  k  —  n)~ fXg”  k\i  +  k 


n)2  + 


•  •  • 


+  x(j,n+]k\i  +  k-n ) 


i  >  n  —  k 


(13.3.1) 


13.4  Confidence  Intervals 

We  first  consider  the  confidence  interval  for  the  moving  average  n,  from 
Eq.  (13.2.10).  The  errors  of  the  measurements  y,  are  unknown  and  must  there¬ 
fore  first  be  estimated.  From  (12.3.2)  one  obtains  for  the  sample  variance  of 
the  yj  in  the  interval  of  length  2k  +  1 , 
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f  <yj  ~  v)2 


2k -l 


j=~k 


(13.4.1) 


where  rjj  is  given  by 


rjj -xi  +x2j  +X2,j  -\ - \-xt+ijl  .  (13.4.2) 

The  covariance  matrix  for  the  measurements  can  then  be  estimated  by 

Gy1  &  Syl  .  (13.4.3) 

The  covariance  matrix  of  the  coefficients  x  is  then  given  by  (9.2.27), 

G~ 1  «  ( ATGyA r1  =  s2(ATA)~l  .  (13.4.4) 


Since  w,-  =  =  ,v  j ,  we  thus  have  for  an  estimator  of  the  variance  of  «/ 

4  =  (G-1)!!  =  s2y((ATArl)n  =  s2ya0  .  (13.4.5) 


From  (13.2.6),  (13.2.7),  and  (13.2.8)  one  easily  obtains  that  (ATA)["|I  =  ao, 
since  the  middle  row  of  A  is  —  (1, 0, 0, . . . ,  0). 

Using  the  same  reasoning  as  in  Sect.  12.3  we  obtain  at  a  confidence  level 
of  1  —  (X 


lfio(Q-fio(OI 

SyClQ 


~  t^~\a 


(13.4.6) 


For  a  given  a  we  can  give  the  confidence  limits  as 


rjo(i)±a0sytl_^a 


(13.4.7) 


Here  t^_  i  is  a  quantile  of  Student’s  distribution  for  2k  —  l  degrees  of  free¬ 
dom.  The  true  value  of  the  trend  lies  within  these  limits  with  a  confidence 
level  of  1  —  a . 

Completely  analogous,  although  more  difficult  computationally,  is  the 
determination  of  confidence  limits  at  the  ends  of  the  time  series.  The  mov¬ 
ing  average  is  now  given  by  (13.3.1).  Labeling  the  arguments  in  the  expres¬ 
sions  (13.3.1)  j  =  i  —k  —  1  and  j  =  i  +k  —  n,  we  obtain 


rri 

t]  —  Tx  . 

Here  T  is  a  row  vector  of  length  l  +  1 , 

T=(l,;,;2, ...,/) 


(13.4.8) 


(13.4.9) 
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According  to  the  law  of  error  propagation  (3.8.4)  we  obtain 

G~1  =  TG~1Tt  .  (13.4.10) 

With  (13.4.4)  we  finally  have 

G-1  *  s%  =  s*T (At Ay1  Tt  ,  (13.4.11) 

where  Sy  is  again  given  by  (13.4.1). 

The  quantity  s~  can  now  be  computed  for  every  value  of  j ,  even  for  values 
lying  outside  of  the  time  series  itself.  Thus  we  obtain  the  confidence  limits 

V±U)  =  rKi)±Srft1_  ia  .  (13.4.12) 

Caution  is  always  recommended  in  interpreting  the  results  of  a  time  series 
analysis.  This  is  particularly  true  for  two  reasons: 

1.  There  is  usually  no  a  priori  justification  for  the  mathematical  model 
on  which  the  time  series  analysis  is  based.  One  has  simply  chosen  a 
convenient  procedure  in  order  to  “separate  out”  statistical  fluctuations. 

2.  The  user  has  considerable  freedom  in  choosing  the  parameters  k  and  £, 
which,  however,  can  have  a  significant  influence  on  the  results.  The  fol¬ 
lowing  example  gives  an  impression  of  the  magnitude  of  such 
influences. 

Example  13.2:  Time  series  analysis  of  the  same  set  of  measurements  using 
different  averaging  intervals  and  polynomials  of  different  orders 

Figures  13.2  and  13.3  contain  time  series  analyses  of  the  average  number  of 
sun  spots  observed  in  the  36  months  from  January  1962  through  December 
1964.  Various  values  of  k  and  i  were  used.  The  individual  plots  in  Fig.  13.2 
show  i  =  1  (linear  averaging)  but  different  interval  lengths  (2k  +  1  =  5,  7,  9, 
and  11).  One  can  see  that  the  curve  of  moving  averages  becomes  smoother 
and  the  confidence  interval  becomes  narrower  when  k  increases,  but  that  then 
the  mean  deviation  of  the  individual  observations  from  the  curve  also  in¬ 
creases.  The  extrapolation  outside  the  range  of  measured  points  is,  of  course, 
a  straight  line.  (For  l  =  0  we  would  have  obtained  the  same  moving  averages 
for  the  inner  points.  The  outer  and  extrapolated  points  would  lie,  however,  on 
a  horizontal  line,  since  a  polynomial  of  order  zero  is  a  constant.)  The  plots 
in  Fig.  13.3  correspond  to  the  interval  lengths  2k  +  1  =7  and  t  =  1, 2,  3, 4. 
The  moving  averages  lie  closer  to  the  data  points  and  the  confidence  interval 
becomes  larger  when  the  value  of  l  increases.  ■ 
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Fig.  13.2:  Time  series  analyses  of  the  same  data  with  fixed  i  and  various  values  of  k. 
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Fig.  13.3:  Time  series  analyses  of  the  same  data  with  fixed  k  and  various  values  of  i. 
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From  these  observations  we  can  derive  the  following  qualitative  rules: 

1.  The  averaging  interval  should  not  be  chosen  larger  than  the  region 
where  one  expects  that  the  data  can  be  well  described  by  a  polyno¬ 
mial  of  the  given  order.  That  is,  for  i  =  1  the  interval  2k  +  1  should 
be  chosen  such  that  the  expected  nonlinear  effects  within  the  interval 
remain  small. 

2.  On  the  other  hand,  the  smoothing  effect  becomes  stronger  as  the  length 
of  the  averaging  interval  increases.  As  a  rule  of  thumb,  the  smoothing 
becomes  more  effective  for  increasing  2k  +  1  —  l . 

3.  Caution  is  required  in  extrapolation  of  a  time  series,  especially  in  non¬ 
linear  cases. 

The  art  of  time  series  analysis  is,  of  course,  much  more  highly  developed 
than  what  we  have  been  able  to  describe  in  this  short  chapter.  The  interested 
reader  is  referred  to  the  specialized  literature,  where,  for  example,  smoothing 
functions  other  than  polynomials  or  multidimensional  analyses  are  treated. 


13.5  Java  Class  and  Example  Programs 

Java  Class  for  Time  Series  Analysis 
TimeSeries  performs  a  time  series  analysis. 

Example  Program  13.1:  The  class  ElTimSer  demonstrates  the  use  of 
TimesSeries 

The  uses  the  data  of  Example  13.2.  After  setting  some  parameters  it  performs  a  time 
series  analysis  by  a  call  of  TimesSeries  with  k  =  2,  l  —  2.  The  data,  moving  aver¬ 
ages,  and  distances  to  the  confidence  limits  (at  a  confidence  level  of  90%)  are  output 
numerically. 

Example  Program  13.2:  The  class  E2TimSer  performs  a  time  series 
analysis  and  yields  graphical  output 

The  program  starts  works  on  the  same  data  as  ElTimSer.  It  allows  interactive  input 
for  the  parameters  k  and  t  and  for  the  confidence  level  P  and  then  performs  a  time 
series  analysis.  Subsequently  a  plot  is  produced  in  which  the  data  are  displayed  as 
small  circles.  The  floating  averages  are  shown  as  a  polyline.  Polylines  in  a  different 
color  indicate  the  confidence  limits. 

Suggestion:  Produce  the  individual  plots  of  Figs.  13.2  and  13.3. 
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A.  Matrix  Calculations 


The  solution  of  the  over-determined  linear  system  of  equations  Ax  b  is  of 
central  significance  in  data  analysis.  This  can  be  solved  in  an  optimal  way 
with  the  singular  value  decomposition,  first  developed  in  the  late  1960s.  In 
this  appendix  the  elementary  definitions  and  calculation  rules  for  matrices  and 
vectors  are  summarized  in  Sects.  A.l  and  A.2.  In  Sect.  A.3  orthogonal  trans¬ 
formations  are  introduced,  in  particular  the  Givens  and  Householder  transfor¬ 
mations,  which  provide  the  key  to  the  singular  value  decomposition. 

After  a  few  remarks  on  determinants  (Sect.  A.4)  there  follows  in  Sect.  A.5 
a  discussion  of  various  cases  of  matrix  equations  and  a  theorem  on  the  orthog¬ 
onal  decomposition  of  an  arbitrary  matrix,  which  is  of  central  importance  in 
this  regard.  The  classical  procedure  of  normal  equations,  which,  however,  is 
inferior  to  the  singular  value  decomposition,  is  described  here. 

Sections  A.6-A.8  concern  the  particularly  simple  case  of  exactly  deter¬ 
mined,  non-singular  matrix  equations.  In  this  case  the  inverse  matrix  A-1 
exists,  and  the  solution  to  the  problem  Ax  =  b  is  x  =  A_1b.  Methods  and 
programs  for  finding  the  solution  are  given.  The  important  special  case  of  a 
positive-definite  symmetric  matrix  is  treated  in  Sect.  A.9. 

In  Sect.  A.  10  we  define  the  pseudo-inverse  matrix  A+  of  an  arbitrary  ma¬ 
trix  A.  After  introducing  eigenvectors  and  eigenvalues  in  Sect.  A.l  1,  the  sin¬ 
gular  value  decomposition  is  presented  in  Sects.  A.  12  and  A.  13.  Computer 
routines  are  given  in  Sect.  A.  14.  Modifications  of  the  procedure  and  the  con¬ 
sideration  of  constraints  are  the  subject  of  Sects.  A.15  through  A.18. 

It  has  been  attempted  to  make  the  presentation  illustrative  rather  than 
mathematically  rigorous.  Proofs  in  the  text  are  only  indicated  in  a  general 
way,  or  are  omitted  entirely.  As  mentioned,  the  singular  value  decomposition 
is  not  yet  widespread.  An  important  goal  of  this  appendix  is  to  make  its 
use  possible.  For  readers  with  a  basic  knowledge  of  matrix  calculations,  the 
material  covered  in  Sects.  A.3,  A.12,  A.13,  A.14.1,  and  A.18  is  sufficient  for 
this  task.  Sections  A.  14.2  through  A.  14.5  contain  technical  details  on  carrying 
out  the  singular  value  decomposition  and  can  be  omitted  by  hurried  users. 
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All  procedures  described  in  this  Appendix  are  implemented  as  methods 
of  the  classes  DatanVector  orDatanMatrix,  respectively.  Only  in  a 
few  cases  will  we  refer  to  these  methods  explicitly,  in  oder  to  establish  the 
connection  with  some  of  the  more  complicated  algorithms  in  the  text. 


A.l  Definitions:  Simple  Operations 


By  a  vector  in  m  dimensions  (an  m -vector)  a  we  mean  an  m -tuple  of  real 
numbers,  which  are  the  components  of  a.  The  arrangement  of  the  components 
in  the  form 


l  a  i  \ 


a  = 


(A.  1.1) 


V 


/ 


is  called  a  column  vector. 

A  m  x  n  matrix  is  a  rectangular  arrangement  of  m  x  n  numbers  in  m  rows 
and  n  columns, 


Ml  ■■■  Mn  \ 

Ml  ■  ■  ■  Mn 


(A.  1.2) 


y  Am\  A  m2 


J 


It  can  be  viewed  to  be  composed  of  n  column  vectors.  By  transposition  of  the 
m  xn  matrix  A  one  obtains  an  n  xm  matrix  AT  with  elements 


Ml k  —  Mi 

Under  transposition  a  column  vector  becomes  a  row  vector , 

T 

a  —  Ka  1 ,  a 2 , « •  • ,  am ) 


(A.  1.3) 


(A.  1.4) 


A  column  vector  is  an  m  x  1  matrix;  a  row  vector  is  a  1  x  m  matrix. 

For  matrices  one  has  the  following  elementary  rules  for  addition ,  sub¬ 
traction  and  multiplication  by  a  constant 


A±B  =  C  ,  Cik  =  Aik±  Bik  ,  (A.1.5) 

aA  =  B  ,  Bik=aAik  .  (A.1.6) 

The  product  AB  of  two  matrices  is  only  defined  if  the  number  of  columns  of 
the  first  matrix  is  equal  to  the  number  of  rows  of  the  second,  e.g.,  A  =  Amxi 
and  B  =  B(xm.  One  has  then 
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i 

AB  —  C  ,  Cjk  =  YAijBJk 

7  =  1 


(A.  1.7) 


Since 


Cl  =  Cti  =  y "AtjBj,  =  Ta^bJj  =  Y>JaJ 


jk 


7  =  1 


7  =  1 


7  =  1 


one  has 


T  a  t 


C1  =  (A5)A  =  BlA 


(A.  1.8) 


With  (A.  1 .7)  one  can  also  define  the  product  of  a  row  vector  aT  with  a  column 
vector  b,  if  both  have  the  same  number  of  elements  m , 


m 

a  b  =  aTb  =  c  ,  c  =  ^2ajbj  .  (A.  1.9) 

7  =  1 

The  result  is  then  a  number,  i.e.,  a  scalar.  The  product  (A.  1.9)  is  called  the 
scalar  product.  It  is  usually  written  without  indicating  the  transposition  sim¬ 
ply  as  a  •  b.  The  vectors  a  and  b  are  orthogonal  to  each  other  if  their  scalar 
product  vanishes.  Starting  from  (A.  1 .9)  one  obtains  the  following  useful  prop¬ 
erty  of  the  matrix  product  (A.  1.7).  The  element  Cp,  which  is  located  at  the 
intersection  of  the  ith  row  and  the  &th  column  of  the  product  matrix  C,  is 
equal  to  the  scalar  product  of  the  ith  row  of  the  first  matrix  A  with  the  kth 
column  of  the  second  matrix  B . 

The  diagonal  elements  of  a  matrix  (A.  1.2)  are  the  elements  An.  They 
form  the  main  diagonal  of  the  matrix  A.  If  all  of  the  non-diagonal  elements 
vanish,  Ajj  =  0,  i  /  j,  then  A  is  a  diagonal  matrix.  An  n  x  n  diagonal  matrix 
all  of  whose  diagonal  elements  are  unity  is  the  n -dimensional  unit  matrix 

/«  =  /, 


/  1  0  ...  0  \ 
0  1  ...  0 

^0  0...  1  J 

The  null  matrix  has  only 


1 

1 

o  > 

• 

v° 

• 

• 

l) 

zeros  as  elements: 


(A.  1.10) 


(A.l. 11) 


A  null  matrix  with  only  one  column  is  the  null  vector  0. 
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We  will  now  mention  several  more  special  types  of  square  matrices. 

A  square  matrix  is  symmetric  if 

Aik  =  Aki  .  (A.  1.1 2) 

If 

Aik  —  —Aki  ,  (A.  1.1 3) 

then  the  matrix  is  antisymmetric. 

A  bidiagonal  matrix  B  possesses  non-vanishing  elements  only  on  the 
main  diagonal  (bn)  and  on  the  parallel  diagonal  directly  above  it  (bij+i). 

A  tridiagonal  matrix  possesses  in  a  addition  non- vanishing  elements  di¬ 
rectly  below  the  main  diagonal.  A  lower  triangular  matrix  has  non- vanishing 
elements  only  on  and  below  the  main  diagonal,  an  upper  triangular  matrix 
only  on  and  above  the  main  diagonal. 

The  Euclidian  norm  or  the  absolute  value  of  a  vector  is 

(A.  1.14) 

A  vector  with  unit  norm  is  called  a  unit  vector.  We  write  this  in  the  form 

a~=a/a 

More  general  vector  norms  are 

| aj | p)l/p  ,  1  <  p  <  oo  ,  (A.1.15) 

j 

max  |  a  j 
j 

For  every  vector  norm  ||x||  one  defines  a  matrix  norm  ||  A||  as 

I  All  =  max  II  Ax  II  /  II  x  II  .  (A. 1.16) 

x^O 

Matrix  norms  have  the  following  properties: 


II -A  II 

> 

0  ,  A#0  ; 

II A  ||  =  0  , 

A  =  0  , 

(A.  1.17) 

HaA|| 

— 

NIIAII  ,  a 

real , 

(A.  1.1 8) 

||  A  +  fi|| 

< 

-All  +  ||  fill  , 

(A.  1.19) 

II ■A  fill 

< 

A  ||  ||  fi||  . 

(A.  1.20) 
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A.2  Vector  Space,  Subspace,  Rank  of  a  Matrix 


An  n-dimensional  vector  space  is  the  set  of  all  n-dimensional  vectors.  If  u 
and  v  are  vectors  in  this  space,  then  au  and  u  +  v  are  also  in  the  space,  i.e., 
the  vector  space  is  closed  under  vector  addition  and  under  multiplication  with 
a  scalar  a.  The  vectors  ai ,  a2, . . . ,  a&  are  linearly  independent  if 

k 

'y  \xjSij  ^  0  (A.2. 1) 

7  =  1 

for  all  <Xj  except  for  a\  =  a?  =  •■•  =  «/;  =  0.  Otherwise  they  are  linearly 
dependent.  The  maximum  number  kmax  of  vectors  that  can  be  linearly  inde¬ 
pendent  is  equal  to  the  dimension  of  the  vector  space  n.  An  arbitrary  set  of  n 
linearly  independent  vectors  ai,  a2, . . . ,  a„  forms  a  basis  of  the  vector  space. 
Any  vector  a  can  be  expressed  as  a  linear  combination  of  the  basis  vectors, 


n 


a=J]«7a7  ' 

7  =  1 


A  special  basis  is 

/  1  \ 

0 
0 


ei  = 


e2  - 


V 0  7 


{  0\ 
1 
0 

\°  J 


These  basis  vectors  are  orthonormal,  i.e., 


e« '  e7  —  — 


— 


/o\ 

o 

0 

1 ) 


1,  i  =  j 

o,  i  #  j 


(A.2.2) 


.  (A.2.3) 


(A.2.4) 


The  component  aj  of  the  vector  a  is  the  scalar  product  of  a  with  the  basis 
vector  tj , 

a  •  e;  =  aj  ,  (A. 2. 5) 


cf.  (A.  1.1)  and  (A.  1.9). 

For  n  <  3  vectors  can  be  visualized  geometrically.  A  vector  a  can  be  rep¬ 
resented  as  an  arrow  of  length  a.  The  basis  vectors  (A.2.3)  are  perpendicular 
to  each  other  and  are  of  unit  length.  The  perpendicular  projections  of  a  onto 
the  directions  of  the  basis  vectors  are  the  components  (A.2.5),  as  shown  in 
Fig.  A.  1 . 
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Fig.A.l  :  The  vector  a  in  the  sys¬ 
tem  of  orthonormal  basis  vectors 
ei,  e2. 


A  subset  T  of  a  vector  space  S  is  called  a  subspace  if  it  is  itself  closed 
under  vector  addition  and  multiplication  with  a  scalar.  The  greatest  possi¬ 
ble  number  of  linearly  independent  vectors  in  T  is  the  dimension  of  T.  The 
product  of  an  m  x  n  matrix  A  with  an  n -vector  a  is  an  m  -vector  b, 


b  =  Aa  .  (A.2.6) 

The  relation  (A.2.6)  can  be  regarded  as  a  mapping  or  transformation  of  the 
vector  a  onto  the  vector  b. 

The  span  of  a  set  of  vectors  ai , . . . ,  a i  is  the  vector  space  defined  by  the 
set  of  all  linear  combinations  u  of  these  vectors, 

k 

u  =  ^2ajaj  •  (A.2.7) 

7=1 

It  has  a  dimension  m  <  k.  The  column  space  of  an  m  x  n  matrix  A  is  the 
span  of  the  n  column  vectors  of  A;  in  this  case  the  m -vectors  u  have  the  form 
u  =  Ax  with  arbitrary  n -vectors  x.  Clearly  the  dimension  of  the  column  space 
is  <  min (m,  n).  Similarly,  the  row  space  of  A  is  the  span  of  the  m  row  vectors. 
The  null  space  or  kernel  of  A  consists  of  the  set  of  vectors  x  for  which 

Ax  =  0  .  (A.2.8) 

Column  and  row  spaces  of  an  m  x  n  matrix  have  the  same  dimension.  This  is 
called  the  rank  of  the  matrix.  An  m  xn  has  full  rank  if 

Rang(A)  =  min(m,  n)  .  (A.2.9) 

Otherwise  it  has  reduced  rank.  An  n  xn  matrix  with  Rang(A)  <  n  is  said  to 
be  singular. 
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A  vector  a  is  orthogonal  to  a  subspace  T  if  it  is  orthogonal  to  every  vector 
t  £  T.  (A  trivial  example  is  t  =  tiei  + a  =  at 3,  a  •  t  =  0.)  A  subspace  U 
is  orthogonal  to  the  subspace  T  if  for  every  pair  of  vectors  u  e  U,  t  e  T  one 
has  u  •  t  =  0.  The  set  of  all  vectors  u  + 1  forms  a  vector  space  V,  called  the 
direct  sum  of  T  and  U , 

V  =  T®U  .  (A.2.10) 

Its  dimension  is 

dim(F)  =  dim(T)  +  dim(C7)  .  (A.2.11) 

If  (A.2.10)  holds,  then  T  and  U  are  subspaces  of  V.  They  are  called  orthog¬ 
onal  complements,  T  =  [/  ,  U  —  T  .  If  T  is  a  subspace  of  S,  then  there 
always  exists  an  orthogonal  complement  T ±,  such  that  S  =  T  ©  T^.  Every 
vector  aeS  can  then  be  uniquely  decomposed  into  the  form  a  =  t  +  u  with 
tel  and  uef1.  For  the  norms  of  the  vectors  the  relation  a2  =  t2  +  u2  holds. 

If  T  is  an  ( n  —  1) -dimensional  subspace  of  an  n -dimensional  vector  space 
S  and  if  s  is  a  fixed  vector  in  S,  then  the  set  of  all  vectors  h  =  s  + 1  with  t  e  T 
forms  an  in  —  1) -dimensional  hyperplane  H  in  S.  If  H  is  given  and  ho  is  an 
arbitrary  fixed  vector  in  H,  then  T  is  the  set  of  all  vectors  t  =  h  —  ho,  he//, 
as  shown  in  Fig.  A.2.  If  u  is  a  unit  vector  in  the  one-dimensional  subspace 
T  —  H2-  orthogonal  to  H,  then  the  scalar  product 

uh  =  d  (A.2.12) 

has  the  same  value  for  all  h  e  H,  where  d  is  the  distance  of  the  hyperplane 
from  the  origin  (see  Fig.  A.3). 


Fig.A.2:  Hyperplane  H  in  a 
two-dimensional  vector  space. 

For  a  given  u  and  d,  Eq.  (A.2.12)  defines  a  hyperplane  H.  It  divides  the 
n -dimensional  vector  space  into  two  half  spaces,  which  consist  of  the  set  of 
vectors  x  for  which  u  •  x  <  0  and  u  •  x  >  0. 

A.3  Orthogonal  Transformations 

According  to  (A.2.6),  the  mapping  of  an  « -vector  a  onto  an  /? -vector  b  is  per¬ 
formed  by  multiplication  by  a  square  n  xn  matrix,  b  =  Q  a.  If  the  length  of 


H 
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e2 


/ 

/ 


Fig.A.3:  Hyperplane  II  and 
complementary  one-dimen¬ 
sional  vector  space  T . 


the  vector  (A.  1.1 4)  remains  unchanged,  one  speaks  of  an  orthogonal  trans¬ 
formation.  For  such  a  case  one  has  b  =  a  or  b2  =  a2,  i.e., 


bTb  =  aTQTQa  =  aTa 


5 


and  thus 

QrQ  =  I  •  (A.3.1) 

A  square  matrix  Q  that  fulfills  (A.3.1)  is  said  to  be  orthogonal. 

It  is  clear  that  transformations  are  orthogonal  when  the  transformed  vec¬ 
tor  b  is  obtained  from  a  by  means  of  a  spatial  rotation  and/or  reflection.  We 
will  examine  some  orthogonal  transformations  that  are  important  for  the  ap¬ 
plications  of  this  appendix. 


A.3.1  Givens  Transformation 


The  Givens  rotation  is  a  transformation  that  affects  only  components  in  a 
plane  spanned  by  two  orthogonal  basis  vectors.  For  simplicity  we  will  first 
consider  only  two-dimensional  vectors. 


(a) 

e2 


(b) 

e2 


Fig.A.4:  Application 
of  the  Givens  trans¬ 
formation  (a)  to  the 
vector  v  that  defines 
the  transformation 
and  (b)  to  an  arbitrary 
vector  u. 
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A  vector  v  can  be  represented  as 


v  =  |  1  )  =  i>  (  J  ,  c  =  cosd  =  v\/v  ,  s  =  sind  =  t>2/n  .  (A.3. 2) 
V2  J  V 


A  rotation  by  an  angle  —  ft  transforms  the  vector  v  into  the  vector  v'  =  Gv, 
whose  second  component  vanishes,  as  in  Fig.  A.4a.  Clearly  one  has 


(A.3. 3) 


Of  course  the  transformation  thus  defined  with  the  vector  v  can  also  be  applied 
to  any  other  vector  u,  as  shown  in  Fig.  A.4b.  In  n  dimensions  the  Givens 
rotation  leads  to  the  transformation 


v 


5 


such  that  v'k  =  0.  The  components  v't  =  vt,  l  #  k,  i  ^  i,  remain  unchanged 
and  v'  is  determined  such  that  the  norm  of  the  vector  remains  unchanged, 
v'  —  v.  This  is  clearly  given  by 


1 


l 


(A.3.4) 


k 


n 
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In  practical  applications,  however,  the  full  matrix  is  not  needed.  The  method 
DatanMatrix.defineGivensTransformation  defines  a  Givens 
transformation  by  the  input  of  the  two  components  v\ ,  i>2,  DatanMatrix._ 
ApplyGivensTransformation  applies  that  transformation  to  two  com¬ 
ponents  of  another  vector.  The  method  DatanMatrix.  defineAND Ap¬ 
plyGivensTransformation  defines  a  transformation  and  directly  ap¬ 
plies  it  to  the  defining  vector. 


A.3.2  Householder  Transformation 

The  Givens  rotation  is  used  to  transform  a  vector  in  such  a  way  that  a  given 
vector  component  vanishes.  A  more  general  transformation  is  the  House¬ 
holder  transformation.  If  the  original  vector  is 


(A.3.5) 


then  for  the  transformed  vector  we  want  to  have 

v\ 

V  p— 1 

v'p 

vp+ 1 

V£-l 
0 

0 

That  is,  the  components  v'£,  i>'+1 ,  . . .,  v'n  should  vanish.  The  remaining  com¬ 
ponents  should  (with  the  exception  of  v' )  remain  unchanged.  The  component 
v'  must  be  changed  in  such  a  way  that  one  has  v  =  v'.  From  the  Pythagorean 
theorem  in  n  —  i  +  1  dimensions  one  has 

n 

a  2  2  ,  2 

vp  =vH  =  vp  +  2^Vi 

i=l 
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or 


vp  =  -crvH  =  -o 


N 


n 


V 


P 


(A.3.7) 


i=t 


with  a  =  ±  1 .  We  choose 

o  =  sign(  u  p )  .  (A.3. 8) 

We  now  construct  the  matrix  H  of  (A.3. 6).  To  do  this  we  decompose  the 
vector  v  into  a  sum, 


0  \ 


0 


V^V/Z+Vyi 


\H  = 


V 


p 

0 


0 

Vl 


Vi  \ 


\H±  = 


V  vn  / 


V  p— 1 

0 

vp+ i 


V£-l 

0 


.  (A.3.9) 


V  0  ) 


Here  the  vector  \h  is  in  the  subspace  spanned  by  the  basis  vectors  ep,  ei, 
e^+i, . . .,  e„,  and  \H±  is  in  the  subspace  orthogonal  to  it.  We  now  construct 


u  =  \H+ovHep 


(A.3. 10) 


and 


H  =  I„- 


2uu 


n  2 
nz 


(A.3. 11) 


If  we  now  decompose  an  arbitrary  vector  a  into  a  sum  of  vectors  parallel  and 
perpendicular  to  u, 

a  =  aii+a_i_  , 


with 


llll  u 

an  =  — ^-a  =  — r(u-a)  =  u(u  a)  , 


aj_  =  a  —  ai 


u 


u 


then  one  has 


a' ||  =  H&\\  =  —  an  ,  a'j.  =  //aj_  =  aj.  . 

Thus  we  see  that  the  transformation  is  a  reflection  in  the  subspace  that  is 
orthogonal  to  the  vector  u,  as  in  Fig.  A.5.  One  can  easily  verify  that  H  in  fact 
yields  the  transformation  (A.3. 6). 

The  transformation  is  uniquely  determined  by  the  vector  u.  According  to 
(A.3. 10)  one  has  for  the  components  of  this  vector  up  —  vp  +<jvh,  U(  —  vi. 
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— > 


\ 

\ 

\ 

\ 


/ 


Fig.A.5  :  The  vector  v  is 
mapped  onto  vr  according 
to  a  Householder  transfor¬ 
mation  such  that  v'2  =  0. 
The  mapping  corresponds 
to  a  reflection  in  the  sub¬ 
space  U  that  is  orthogonal 
to  the  auxiliary  vector  11. 


ui+\  —  vi+i,  . . un  =  vn,  and  U[  —  0  for  all  other  i.  If  the  vector  v  and  the 
indices  p  and  i  are  given,  then  only  up  must  be  computed.  The  quantity  u 2 
appearing  in  (A.3.1 1)  is  then 

u 2  =  +  =  (vp  +  *vH)2  +  Yvf 

i  —i  i  —i 

n 

=  v2p  +  ^vf  +  v2H  +  2avHvp 

i=i 

=  2v2h  +2<jvhvp  =  2vh(vh  +  <?vp)  =  2vhup  . 


We  can  thus  write  (A.3. 1 1)  in  the  form 

H  —  In—buuT  ,  b  —  (vhu  p)_ 1  •  (A. 3. 12) 

The  matrix  H,  however,  is  not  needed  explicitly  to  compute  a  trans¬ 
formed  vector, 

c'  =  He  . 

It  is  sufficient  to  know  the  vector  u  and  the  constant  b.  Since,  however,  u 
only  differs  from  v  in  the  element  up  (and  in  the  vanishing  elements),  it  is 
sufficient,  starting  from  v,  to  first  compute  the  quantities  up  and  b  and  when 
applying  the  transformation  to  use  in  addition  the  elements  ly,  vg+i,  v„. 
These  are  at  the  same  time  the  corresponding  elements  of  u. 

By  the  method  DatanMatrix.defineHouseholderTranfor- 
mation  a  transformation  is  defined;  with  DatanMatrix.applyHouse- 
HolderTransformation  it  is  applied  to  a  vector. 
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A.3.3  Sign  Inversion 


If  the  diagonal  element  la  of  the  unit  matrix  is  replaced  by  —1,  then  one 
obtains  a  symmetric  orthogonal  matrix, 


/  1 

R(i)  = 

Applying  this  to  the  vector  a, 


\ 


a'  =  R(i)  a  , 

changes  the  sign  of  the  element  a,-  and  leaves  all  the  other  elements  un¬ 
changed.  Clearly  R(n  is  a  Householder  matrix  which  produces  a  reflection 
in  the  subspace  orthogonal  to  the  basis  vector  e, .  This  can  be  seen  immedi¬ 
ately  by  substituting  u  =  e,-  in  (A.3. 11). 


A.3.4  Permutation  Transformation 


The  nxn  unit  matrix  (A.  1.10)  can  easily  be  written  as  an  arrangement  of  the 
basis  vectors  (A.2.3)  in  a  square  matrix, 


It  is  clearly  an  orthogonal  matrix.  The  orthogonal  transformation  7a  =  a 
leaves  the  vector  a  unchanged.  If  we  now  exchange  two  of  the  basis  vec¬ 
tors  e,-  and  e*,  then  we  obtain  the  symmetric  orthogonal  matrix  Plk .  As  an 
example  we  show  this  for  n  =4,  i  =  2,  k  =  4, 


P 


0 ik )  _ 


/  1  0  0  0  \ 

0  0  0  1 

0  0  10 

\  0  1  0  0  / 


The  transformation  a'  =  Puk>a  leads  to  an  exchange  of  the  elements  a,  and 
ak .  All  other  elements  remain  unchanged: 
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/  ai\ 


a  = 


a; 


ak 


/  ai\ 


a'  =  Pma  = 


\  a, I  / 


aic 


Cli 


\an  ) 


Multiplication  of  an  n  x  m  matrix  A  on  the  left  with  Puk)  permutes  the  lines 
i  and  k  of  A.  Multiplication  of  an  m  xn  matrix  A  from  the  right  exchanges 
the  columns  i  and  k.  If  D  is  an  n  x  n  diagonal  matrix,  then  the  elements  Dn 
and  Dkk  are  exchanged  by  the  operation 


A.4  Determinants 

To  every  n  xn  matrix  A  one  can  associate  a  number,  its  determinant,  det A . 
A  determinant,  like  the  corresponding  matrix,  is  written  as  a  square  arrange¬ 
ment  of  the  matrix  elements,  but  is  enclosed,  however,  by  vertical  lines.  The 
determinants  of  orders  two  and  three  are  defined  by 


det  A  = 

^11  A12  _  . 

Mi  M2  11 

A22  —  A12A21 

(A.4.1) 

and 

det  A  = 

All  A12  A13 

A2I  A22  A23 
A31  A32  A33 

A11A22A33  ■ 

+  A12A23A31 

+  ^13^21^32 

-  A11A23A32 

-  A12A21A33 

-  A13A22A31 

(A.4. 2) 

or,  written  in  another  way, 

det  A  = 

A11  (A22A33 
-^1 12(^21  A33 

+  ^13(^21^32 

—  A23A32) 

—  ^23^31) 

—  A22A31)  . 

(A.4. 3) 

A  general  determinant  of  order  n  is  written  in  the  form 
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detA  = 


An 

A\2 

A\n 

A21 

• 

A22 

^2  n 

• 

• 

Anl 

A„2 

Ann 

(A.4.4) 


The  cofactor  A  --  of  the  element  A//  of  a  matrix  is  a  determinant  of  order 


( n  —  1),  calculated  from  the  matrix  obtained  by  deleting  the  zth  row  and  yth 
column  of  the  original  matrix,  and  multiplying  by  (—  l)i+i , 


An 

A 12 

...  A\j-i 

At,  j+\ 

A\n 

A21 

• 

A  22 

...  A2,  j  1 

A2,j+1 

A2n 

(-iy+j 

• 

• 

Af-1,1 

Aj-1,2 

•  ••  Ai—ij—i 

A*'— 1,7+1 

Ai—\n 

Aj+1,1 

• 

Aj+1,2 

Ai-\-\j  —  l 

Af+i.7+1 

Ai-\-\,n 

• 

• 

Ayi\ 

A?j2 

An,j  —  1 

Aw, 7+1 

Ann 

(A.4.5) 

Determinants  of  higher  order  can  be  written  as  the  sum  of  all  elements  of 
any  row  or  column  multiplied  with  the  corresponding  cofactors, 


n  n 

del  A  =  '£AiiAlt  =  'EAvAlJ  ■  (A.4.6) 

k—  1  k= 1 


One  can  easily  show  that  the  result  is  independent  of  the  choice  of  row  i  or 
column  j .  Equation  (A.4. 3)  already  shows  that  (A.4.6)  is  correct  for  n  =  3. 
Determinants  of  arbitrary  order  can  be  computed  by  decomposing  them  ac¬ 
cording  to  their  cofactors  until  one  reaches,  for  example,  the  order  two.  A 
singular  matrix,  i.e.,  a  square  matrix  whose  rows  or  columns  are  not  linearly 
independent,  has  determinant  zero. 

From  A  we  can  construct  a  further  matrix  by  replacing  each  element  i  j 
by  the  cofactor  of  the  element  j  i .  In  this  way  we  obtain  the  adjoint  matrix 
of  A, 


(A.4.7) 


For  determinants  the  following  rules  hold: 


detA  =  detAT  , 
detAZ?  =  detAdetZ? 


(A.4. 8) 
(A.4.9) 


For  an  orthogonal  matrix  Q  one  has  Q  QJ  =  7,  i.e., 
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det  /  =  1  =  det  Q  det  Q 


and  thus 


det  O  =  ±  1 


(A.4.10) 


A.5  Matrix  Equations:  Least  Squares 


A  system  of  m  linear  equations  with  n  unknowns  x  i ,  X2 , . . xn  has  the  general 
form 

a\\x\  +  anX2 -\ - \-a\nxn-b\=Q  , 

Cl2\X\  +  @22x2  H - 1-  a2nXn  ~  ^2  =  0  , 

(A.5.1) 


“I-  dm2^2  T  '  ‘  '  T  ^mn^n  bm  —  0 


or  in  matrix  notation 


Ax  —  b  =  0 


(A.5.2) 


In  finding  the  solution  to  this  equation  x  we  must  distinguish  between  various 
cases,  which  can  be  characterized  by  the  values  of  m,n,  and 


k  =  Rang(A) 


The  vector  Ax  is  in  the  column  space  of  A,  which  is  of  dimension  k. 
Since  b  is  an  m -vector,  the  equation  can  in  general  only  be  fulfilled  if  k  =  m, 
i.e.,  fork  —  n  —  m  and  for  k  —  m  <n,  since  k  <  min  (m,  n).  Fork  —  n—m  one 
has  n  independent  equations  (A.5.1)  with  n  unknowns,  which  have  a  unique 
solution.  If  k  =  m  <n,  then  there  are  arbitrarily  many  n -vectors  x  that  can  be 
mapped  on  to  the  m -vector  Ax  =  b  such  that  (A.5.2)  is  fulfilled.  The  system 
of  equations  is  underdetermined.  The  solution  is  not  unique. 

For  k  =  Rang(A)  /  m  there  is,  in  general,  no  solution  of  (A.5.2).  In  this 
case  we  look  for  a  vector  x,  for  which  the  left-hand  side  of  (A.5.2)  is  a  vector 
of  minimum  Euclidian  norm.  That  is,  we  replace  Eq.  (A.5.2)  by 

r2  =  (Ax  —  b)2  =  min  ,  (A.5. 3) 


i.e.,  we  look  for  a  vector  x  for  which  the  mapping  Ax  differs  as  little  as  possi¬ 
ble  from  b. 

Given  a  m-vector  c,  only  for  k  =  Rang(A)  =  n  does  there  exist  only  one 
n -vector  x,  such  that  Ax  =  c.  Therefore,  only  for  the  case  k  =  n  is  there  a 
unique  solution  x.  Thus  there  exists  for  Rang(A)  =  n  and  m  >  n  a  unique 
solution  x  of  (A.5. 3).  For  n  =  m  one  has  r  =  0. 
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The  relation  (A.5. 3)  is  also  often  written  simply  in  the  form 

Ax-b^O  (A.5.4) 

or  even,  not  entirely  correctly,  in  the  form  (A.5. 2).  One  calls  the  solution 
vector  x  the  least-squares  solution  of  (A.5.4).  In  Table  A.l  we  list  again  the 
various  cases  that  result  from  different  relationships  between  m,n,  and  k. 


Table  A.l:  Behavior  of  the  solutions  of  (A. 5. 3)  for  various  cases,  m  is  the  row  number,  n  is 
the  column  number  and  k  is  the  rank  of  the  matrix  A. 


Case 

Rang(A) 

Residual 

Solution  unique 

la 

m  —  n 

k  =  n 

r  —  0 

Yes 

lb 

k  <  n 

r  >  0 

No 

2a 

m  >  n 

k  —  n 

r  >  0 

Yes 

2b 

k  <  n 

r  >  0 

No 

3a 

m  <  n 

k  —  m 

r  —  0 

No 

3b 

k  <  m 

r  >  0 

No 

We  now  want  to  state  more  formally  what  we  have  determined  in  this 
section  with  respect  to  the  solution  of  (A.5.4). 

Theorem  on  the  orthogonal  decomposition  of  a  matrix: 

Every  m  x  n  matrix  A  of  rank  k  can  be  written  in  the  form 

A  =  HRKt  ,  (A.5.5) 

where  H  is  an  m  x  m  orthogonal  matrix,  K  is  an  n  x  n  orthogo¬ 
nal  matrix,  and  R  is  an  m  x  n  matrix  of  the  form 

/?=(/?011  q)  (A.5. 6) 

and  where  R\]  is  a  k  xk  matrix  of  rank  k. 

Substituting  (A.5.5)  into  (A.5.4)  and  multiplying  from  the  left  by  HT 
leads  to 

RKtx^Ht  b  .  (A.5. 7) 

We  define 

HJ b  =  g  =  (  gl  ^  \k  (A.5. 8) 

V  §2  J  }m-k 

and 

*TX  =  P=(p2  )  )»-*  ’  <A'5'9) 
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so  that  (A.5.7)  takes  on  the  form 


(A.5.10) 


Because  of  (A.5.6),  this  breaks  into  two  independent  relations 


R 11P1  =  gi 


(A.5.11) 


and 

0-p2^g2  •  (A.5.12) 

If  m  =  k  and/or  n  =  k,  then  the  corresponding  lower  partial  vectors  are  absent 
in  (A.5.8)  and/or  (A.5.9)  and  the  corresponding  lower  matrices  are  absent  in 
(A.5.6).  Since  in  (A.5.1 1)  Ru  is  a  k  x  k  matrix  of  rank  k ,  and  pj  and  gj  are 
k-vectors,  there  exists  a  solution  vector  p  (  for  which  equality  holds.  Because 
of  the  null  matrix  on  the  left-hand  side  of  (A.5.12),  we  cannot  derive  any 
information  about  p2  from  this  relation. 

Theorem  on  the  solutions  of  Ax  &  b:  If  pq  is  the  unique 
solution  vector  of  (A.5.1 1),  then  the  following  statements  hold: 

(i)  All  solutions  of  (A.5.3)  have  the  form 


(A.5.13) 


Here  p2  is  an  arbitrary  (n  —  /:) -vector,  i.e.,  the  solution  is 
unique  for  k  =  n.  There  is  always,  however,  a  unique  solu¬ 
tion  of  minimum  absolute  value: 


x  =  .  (A.5.14) 

(ii)  All  solutions  x  have  the  same  residual  vector 

r  =  b  — (A.5.15) 

with  the  absolute  value  r  =  g2  =  |g2|.  The  residual  vanishes 
for  k  =  m. 

The  problem  Ax  ^  b  should  always  be  handled  with  orthogonal  decom¬ 
positions,  preferably  with  the  singular  value  decomposition  and  singular  value 
analysis  described  in  Sects.  A.  12  and  A.  13.  Numerically  the  results  are  at  least 
as  accurate  as  with  other  methods,  and  are  often  more  accurate  (cf.  Sect.  A.  13, 
Example  A.4). 
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Nevertheless  we  will  briefly  present  as  well  the  method  of  normal  equa¬ 
tions.  The  method  is  very  transparent  compared  to  the  orthogonal  decompo¬ 
sition  and  is  therefore  always  described  in  textbooks. 

We  consider  the  square  (A.5.3)  of  the  residual  vector 


r 


2 


(Ax  -  b)T(Ax  -  b)  =  xTAT  Ax  -  2bTAx  +  bTb 

m  n  n  m  n  m 

Aij  AuXjXi  —  2  ^2  bi  AijXj  +  bj 

i= 1  7  =  1  1= 1  ;  =  1  7  =  1  /'=! 


The  requirement  r2  —  min  leads  to 


dr 2 
3  Xk 


m  n  m 

lYYAjbAux,  -2YbjAik  =  0 

i=i  t=i  «=i 


k  =  1 , ,n 


These  n  linear  equations  are  called  normal  equations.  They  can  be  arranged 
in  a  matrix, 

AtAx  =  ATb  .  (A.5.16) 

This  is  a  system  of  n  equations  with  n  unknowns.  If  the  equations  are 
linearly  independent,  then  the  n  x  n  matrix  (ATA)  is  of  full  rank  n;  it  is  not 
singular.  According  to  Sect.  A.6  there  then  exists  an  inverse  (ATA)-1,  such 
that  (AtA)-1(AtA)  =  I .  Thus  the  desired  solution  to  (A.5.3)  is 

x  =  (ATA)-1ATb  .  (A.5.17) 

This  simple  prescription  is,  however,  useless  if  ATA  is  singular  or  nearly 
singular  (cf.  Example  A.4). 


A.6  Inverse  Matrix 

For  every  non- singular  n  xn  matrix  A,  the  inverse  matrix  A-1  is  defined  by 

AA~l  =  In  =  A~x  A  .  (A.6.1) 


It  is  also  an  n  xn  matrix. 

If  A-1  is  known,  then  the  solution  of  the  matrix  equation 


Ax  =  b 


(A.6.2) 


is  given  simply  by 


x  =  A-1b 


(A.6. 3) 


As  we  will  show,  A  1  only  exists  for  non-singular  square  matrices,  so  that 
(A.6. 3)  only  gives  the  solution  for  the  case  la  of  Table  A.l. 
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In  order  to  determine  A-1,  we  set  A-1  =  X  and  express  the  column  vec¬ 
tors  of  X  as  xi,  X2,  x„.  Equation  (A.6.1)  is  then  decomposed  into  n 

equations: 

Ax;  =  e,  ,  i  =  l,2,  ...,n  .  (A.6.4) 

The  right-hand  sides  are  the  basis  vectors  (A.2.3).  For  the  case  n  =  2  we  can 
write  the  system  (A.6.4)  in  various  equivalent  ways,  e.g., 


(  An  A12  \  /  Xn  \  _  /  1  \ 

V  ^21  A 22  /  \  X21  )  \  0  J 


(  An  A12  \  /  X12 

V  A22  )  V  X22 


or  alternatively 


(  An  A\2  \  /  ^11 
V  A21  A22  )  \  X21 


X12  \  (  1  0  \ 

x22  )  V  0  1  / 


or  as  a  system  of  four  equations  with  four  unknowns, 

An^ll  +^12^21  =  1  , 

^21^11  +  A22^21  =  0  , 
A\\X\2  +  A12X22  =  0  , 
A21X12  + A22X22  —  1 

By  elimination  and  substitution  one  easily  finds 

_ A22 _ 

^11^22  —  ^12^21 
~ A12 

^11^22  —  ^12^21 
— A21 _ 

2I11A22  —  A12A21 
An 

A11A22  —  A12A21 

or  in  matrix  notation 

y  -4-1-  1  (  A22  ~A\2  \ 

detA  V  -A21  An  ) 


Xxx  = 

Xxl  = 
X21  = 
X22  = 


(A.6.6) 


(A.6.7) 


(A.6.8) 


The  matrix  on  the  right-hand  side  is  the  adjoint  of  the  original  matrix,  i.e., 


At 

detA 


(A.6.9) 


One  can  show  that  this  relation  holds  for  square  matrices  of  arbitrary  order. 
From  (A.6.9)  it  is  clear  that  the  inverse  of  a  singular  matrix,  i.e.,  of  a  matrix 
with  vanishing  determinant,  is  not  determined. 
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In  practice  the  inverse  matrix  is  not  computed  using  (A.6.9)  but  rather  as 
in  our  example  of  the  2x2  matrix  by  elimination  and  substitution  from  the 
system  (A.6.4).  This  system  consists  of  n  sets  of  equations  of  the  form 

Ax  =  b  ,  (A.6.10) 

each  consisting  of  n  equations  with  n  unknowns.  Here  A  is  a  non-singular 
nxn  matrix.  We  will  first  give  the  solution  algorithm  for  square  non-singular 
matrices  A,  but  we  will  later  remove  these  restrictions. 


A.7  Gaussian  Elimination 


We  will  write  Eq.  (A.6.10)  for  an  nxn  matrix  A  in  components, 


Anxi  +  A12X2  H - b  A\nxn  —  b\  , 

A21X1  +  A  22-^2  +  •  •  •  +  A  2  nxn  —  ^2  ? 

An\X\  +  A-n2x2  H - h  Annxn  =  bn  , 


(A.7.1) 


and  we  will  solve  the  system  by  Gaussian  elimination.  For  this  we  define 
n  —  1  multipliers 


mu  = 


(A.7. 2) 


multiply  the  first  equation  by  m2\,  and  subtract  it  from  the  second.  We  then 
multiply  the  first  equation  by  mu  and  subtract  it  from  the  third,  and  so  forth. 
We  obtain  the  system 


(i) 


(i) 


(i) 


Aji  X\  +  Aj 2x2~\ - xn  —bl 


A^22X2  T 


XD 


+  2nX”  ~  b 


(2) 

2 


A!iX2  4 - f  ^nnXn  ~  b !.2) 


(A.7. 3) 


where  the  unknown  x\  has  disappeared  from  all  the  equations  except  the  first. 

(2)  (2) 

The  coefficients  A)/,  b)  are  given  by  the  equations 

IJ  l 


a(2) 

Aij 


b 


(2) 


A(1) 

Aij 


b 


(i) 


-  mu  A 

-nij\b\{) 


The  procedure  is  then  repeated  with  the  last  n  —  1  equations  by  defining 
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A 


(2) 


mt  2  = 


i  2 


A 


(2) 

22 


multiplying  the  second  equation  of  the  system  (A.7.3)  with  the  corresponding 
/?? / 2 ,  and  then  subtracting  it  from  the  third,  fourth, . . .,  nth  equation.  In  the  kth 
step  of  the  procedure,  the  multipliers 


mik  = 


A 


(k) 


ik 


A 


(k) 

kk 


i  —  k+l,  k  +  2, 


are  used  and  the  new  coefficients 


(A.7.4) 


(A.7.5) 


are  computed.  After  n  —  1  steps  we  have  produced  the  following  triangular 
system  of  equations: 


A |  2^2  H- * 

A  22  -^2  +  * 

A(1)r 

/±lnXn 

A(Z)x 

rt-2nxn 

• 

• 

-  bm 

-  bm 

-  V  2  , 

• 

• 

(A.7.6) 

A(w)v 

^nn 

• 

-a 

II 

The  last  equation  contains  only  xn .  By  substituting  into  the  next  higher 
equation  one  obtains  xn-\,  i.e.,  in  general 


1 


AO 

ii 


£=7  +  1 


i=n,n  — 1,...,1  .  (A.7.7) 


One  should  note  that  the  change  of  the  elements  of  the  matrix  A  does  not 
depend  on  the  right-hand  side  b.  Therefore  one  can  reduce  several  systems  of 
equations  with  different  right-hand  sides  simultaneously.  That  is,  instead  of 
Ax  =  b,  one  can  solve  the  more  general  system 


AX  —  B  (A.7.8) 

at  the  same  time,  where  B  and  X  are  n  x  m  matrices,  the  matrix  X  being 
unknown. 
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Example  A.l:  Inversion  of  a  3  x  3  matrix 

As  a  numerical  example  let  us  consider  the  inversion  of  a  3  x  3  matrix,  i.e., 
B  =  I.  The  individual  computations  for  the  example 

12  3  \ 

2  1  -2  \X  =  I 

11  2  / 

are  shown  in  Table  A. 2.  The  result  is 


Following  the  individual  steps  of  the  calculation  one  sees  that  division  is 
carried  out  in  two  places,  namely  in  Eqs.  (A.7.4)  and  (A.7.7).  The  denomina¬ 
tor  is  in  both  cases  a  coefficient 

A-f  ,  i  =  1,2,  1  , 

i.e.,  the  upper  left-hand  coefficient  of  the  system,  the  so-called  pivot  for  the 
step  i  —  1  of  the  reduction  process.  Our  procedure  fails  if  this  coefficient  is 
equal  to  zero.  In  such  a  case  one  can  simply  exchange  the  zth  line  of  the 
system  with  some  other  lower  line  whose  first  coefficient  is  not  zero.  The 
system  of  equations  itself  is  clearly  not  changed  by  exchanging  two  equations. 
The  procedure  still  fails  if  all  of  the  coefficients  of  a  column  vanish.  In  this 
case  the  matrix  A  is  singular,  and  there  is  no  solution.  In  practice  (at  least 
when  using  computers  where  the  extra  work  is  negligible)  it  is  advantageous 
to  always  carry  out  a  reshuffling  so  that  the  pivot  is  the  coefficient  with  the 
largest  absolute  value  among  those  in  the  first  column  of  the  reduced  system. 
One  then  always  has  the  largest  possible  denominator.  In  this  way  rounding 
errors  are  kept  as  small  as  possible.  The  procedure  just  described  is  called 
Gaussian  elimination  with  pivoting. 

Following  it,  the  method  DatanMatrix.matrixEquation  solves 
Eq.  (A.7.8).  In  the  same  way  the  method  DatanMatrix. inverse  deter¬ 
mines  the  inverse  of  a  square  nonsingular  matrix. 


A.8  LR-Decomposition 


For  the  elements  A-;  transformed  by  Gaussian  elimination  [cf.  (A.7.6)],  one 
has  on  and  above  the  main  diagonal 


An)  _  («-l) 


•  •  • 


IJ 


IJ 


=  A(i) 
Aij 


(A.8.1) 
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Table  A.2:  Application  of  Gaussian  elimination  to  Example  A.l. 
Reduction 


Matrix  A 

Matrix  B 

Multiplier 

1 

2 

3 

1 

0 

0 

— 

Step  0 

2 

1 

-2 

0 

i 

0 

2 

1 

1 

2 

0 

0 

1 

1 

-3 

-8 

-2 

1 

0 

— 

Step  1 

-1 

-1 

-1 

0 

1 

1 

3 

Step  2 

5 

3 

1 

3 

1 

3 

1 

— 

Substitution 


7  =  1 

X3j 

1 

5 

X2j 

— 1(_ 2-8x1)  =  f 

JClj 

1— 2xf+3xl=-f 

7=2 

X3j 

1 

5 

X2j 

_In  _  8)  —  I 

3  '■1  5'  5 

Xlj 

-2xi+3xi=i 

7=3 

X3j 

3 

5 

X2j 

— 1(0  +  8  x§)  =  -§ 

Xlj 

2xf-3x|=2 

A.8  LR-Decomposition 


371 


and  below  the  main  diagonal 

Ai")  =  Air1)  =  -  =  AI^+1)=0  , 


i  >  J 


(A.8.2) 


The  transformation  (A.7.5)  is  thus  only  carried  out  for  k  =  1, 2, . . . ,  r  with 
r  =  min(7  —  1,  j), 

(A.8. 3) 


A!?+1)  =  A«-m,A« 


IJ 


IJ 


Summation  over  k  gives 


r  r 

(k+ 1) 


k=l 


k= 1 


k=  1 


(1) 


Thus  the  elements  A-  =  A jj  of  the  original  matrix  A  can  be  written  in  the 
form 

AU  =  A\j  +  J2k=lmikA[j>  ,  i<j  , 

A>J=  0  +  EL^Ag>  ,  .  <A'8'4> 

Noting  that  the  multipliers  m,k  have  only  been  defined  up  to  now  for  i  >  k, 
we  can  in  addition  set 


win  —  1  i  i  —  1 , 2, . . . ,  n  , 


and  reduce  the  two  relations  (A. 8. 4)  to  one, 


p 


Aij  =  y'mikAf)  ,  1  <  i,  j  <  n  ,  p  =  min (/,  j)  . 


k= l 


The  equation  shows  directly  that  the  matrix  A  can  be  represented  as  the  prod¬ 
uct  of  two  matrices  L  and  R.  Here  L  is  a  lower  and  R  an  upper  triangular 
matrix, 


L  = 


R  = 


A  =  LR  , 

m  ii 

m2i  m  22 


\  mn  i  mn 2 

r\\  r\2  ■ 

k~22  ■ 


\ 


m 


nn 


J 


r\n  \ 

k~2n 


(  A(1) 


n 


rnn  J  y 


mu  =  1 


A 

A 


(1) 
12 

(2) 

22 


(A.8. 5) 


Aln  \ 


A 


(2) 
2  n 


An)  J 
^nn  / 


The  original  system  of  equations  (A.6.10) 
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Ax  —  L  Rx  —  b 

is  thus  equivalent  to  two  triangular  systems  of  equations, 


Ly=b  ,  Rx= y 

Instead  of  resorting  to  the  formulas  from  Sect.  A.7,  we  can  compute  the 
elements  of  L  and  R  directly  from  (A.8.5).  We  obtain 

nj  =  Akj  ~Yli=\mkirij  ,  j  =  k,  k  +  1,  n  | 

( A  \~^k— 1  •  /  ii  ^  1,...,R  , 

mlk  =  LAik  -  2_f=i  munk  \  /rkk  ,  i  =  k+  1,  . . . ,  n 

(A.8.6) 

i.e.,  for  k  =  1  one  computes  the  first  row  of  R  (which  is  equal  to  the  first  row 
of  A)  and  the  first  column  of  L;  for  k  —  2  one  computes  the  second  row  of 
R  and  the  second  column  of  L.  In  a  computer  program  the  elements  of  R 
and  L  can  overwrite  the  original  matrix  A  with  the  same  indices,  since  when 
computing  rkj  one  only  needs  the  element  Akj  and  other  elements  of  R  and 
L  that  have  already  been  computed  according  to  (A.8.6).  The  corresponding 
consideration  holds  for  the  computation  of  m^. 

The  algorithm  (A.8.6)  is  called  the  Doolittle  LR- decomposition.  By  in¬ 
cluding  pivoting  this  is  clearly  equivalent  to  Gaussian  elimination. 


A.9  Cholesky  Decomposition 

If  A  is  a  real  symmetric  positive-definite  matrix  (cf.  Sect.  A.  11),  then  it  can 
be  uniquely  expressed  as 

A  =  UtU  .  (A.9.1) 

Here  U  is  a  real  upper  triangular  matrix  with  positive  diagonal  elements.  The 
n  xn  matrix  A  has  the  required  property  for  the  validity  of  (A.9.1),  in  partic¬ 
ular  in  those  cases  where  it  is  equal  to  the  product  of  an  arbitrary  real  n  xm 
matrix  B  (with  m  >n  and  full  rank)  with  its  transpose  B  l\  A  —  B  B  l  . 

To  determine  U  we  first  carry  out  the  Doolittle  LR-decomposition,  define 
a  diagonal  matrix  D  whose  diagonal  elements  are  equal  to  those  of  R, 

D  =  diag(rn,  r22,  •••,  rnn)  ,  (A.9.2) 

and  introduce  the  matrices 

D~l  =  diag(r1"11 ,  r^1 ,  . . . ,  r“M!)  (A.9.3) 

D~l/ 2  =  diag(r“'/2,  r^/2 ,  r“n1/2)  .  (A.9.4) 


and 
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We  can  then  write 


A  =  LR  =  LDD~lR  =  LDR'  ,  R'  =  D~lR  .  (A.9.5) 

Because  of  the  assumed  symmetry  one  has 

A  =  At  =  ( R')tD  Lt  .  (A.9. 6) 

Comparing  with  (A.9.5)  gives  LT  =  R',  i.e., 

L  —  Rt  D~l  . 


With 


U  =  D~1/2R  =  Dl/2LT  (A.9. 7) 

Eq.  (A.9.1)  is  indeed  fulfilled.  The  relation  (A.9. 7)  means  for  the  elements  uu 
of  U 


Thus  the  values  r  and  m  can  be  eliminated  in  favor  of  u  from  (A.8.6),  and  we 
obtain  with 


Ukk  — 
ukj  — 


Ek— i 

i=i 


k- 1  __2 


1/2 


u 


Ik 


Akk 

Akj  i  MikUij  \  jukk  i  j  —  k  1 ,...,  n 


k  —  l, ...  ,n 


the  algorithm  for  the  Cholesky  decomposition  of  a  real  positive-definite  sym¬ 
metric  matrix  A.  Since  for  such  matrices  all  of  the  diagonal  elements  are 
different  from  zero,  one  does  not  need  in  principle  any  pivoting.  One  can  also 
show  that  it  would  bring  no  advantage  in  numerical  accuracy.  The  method 
DatanMatrix.choleskyDecomposition  performs  the  Cholesky  de¬ 
composition  of  a  symmetric  positive  definite  square  matrix. 

Positive-definite  symmetric  matrices  play  an  important  role  as  weight  and 
covariance  matrices.  In  such  cases  one  often  requires  the  Cholesky  decom¬ 
position  A  =  UTU  as  well  as  multiplication  of  U  by  a  matrix.  It  is  eas¬ 
ier  computationally  if  this  multiplication  is  done  by  the  method  DatanMa- 
trix.choleskyMultiply,  which  takes  the  triangular  form  of  U  into 
account. 

Of  particular  interest  is  the  inversion  of  a  symmetric  positive-definite  n  x 
n  matrix  A.  For  this  we  will  solve  the  n  matrix  equations  (A. 6.4)  with  the 
previously  described  Cholesky  decomposition  of  A, 


Ax,  =  UTUxj  =  UT  y.-  =  ti 


i  —  1 , . . . ,  fi 


(A.9. 8) 


We  denote  the  6  th  component  of  the  three  vectors  x,-,  y, ,  and  e,-  by  xn,  yn, 
and  eu .  Clearly  one  has  xu  =  (A~l)u  and  eu  =  8u,  since  eu  is  the  element 
(/,  l)  of  the  unit  matrix. 
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We  now  determine  yu  by  means  of  forward  substitution.  From  (A.9.8)  it 
follows  directly  that 

^  '  ^-Skiyik  =  &H  =  &il  > 
k= 1 

i.e., 

Uuyn  —  <5/i  , 

U\2Vi]  +  t/22>’(2  =  <5/2  , 


and  thus 


A/l  =  5/ 1  /  t/i  i  , 


yu  = 


i  /  \ 

77  (  /  ktktyik  I 

Uu\  tt  ) 


Since,  however,  Su  =  0  for  i  /  i,  this  expression  simplifies  to 


y«€  =  o  ,  t  <  i 


yu  = 


1  /  \ 

77  (  /  'Jkiyik  I  5 

Ua\  £  ) 


l  >  i 


We  can  now  obtain  x,-/  by  means  of  backward  substitution  of  yu  into 

U  x,  =  y  , 


or,  written  in  components, 


fl 

^  ]  Uik^ik  —  yu 

k—i 


or 


UllXii  +  U \2% i2  H - 1"  U\nXin 

U22XH  H - 1”  U2nxin 


yn 

yu 


Unnxin  —  yi 


n 


We  obtain 
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yin  /  U} 


nn 


5 


XU 


If  we  compute  xu  only  for  l  >  /,  then  by  backward  substitution  one  only  en¬ 
counters  the  elements  for  £  >/,i.e.,  only  non-vanishing  y,-g.  The  vanishing 
elements  y,-£  thus  do  not  need  to  be  stored.  The  elements  xu  for  l  <  i  follow 
simply  from  the  symmetry  of  the  original  matrix,  xu  —  xa- 

The  method  DatanMatrix.choleskylnversion  first  performs 
the  Cholesky  decomposition  of  A.  Then  a  loop  is  carried  out  over  the  var¬ 
ious  right-hand  sides  i  —  1 of  (A.9.8).  As  the  result  of  running 
through  the  loop  once  one  obtains  the  elements  x/n,  xu  of  row  i. 

They  are  stored  as  the  corresponding  elements  of  the  output  matrix  A.  The 
elements  of  the  vector  y,  which  is  only  used  for  intermediate  results,  can  be 
stored  in  the  last  row  of  A.  Finally  the  elements  below  the  main  diagonal  are 
filled  by  copies  of  their  mirror  images. 


A.10  Pseudo-inverse  Matrix 


We  now  return  to  the  problem  of  Sect.  A.5,  that  of  solving  the  Eq.  (A.5.4) 

Ax^b  (A.  10.1) 

for  an  arbitrary  m  x  n  matrix  A  of  rank  k.  According  to  (A.5. 14),  the  unique 
solution  of  minimum  norm  is 


The  vector  p,  is  the  solution  of  Eq.  (A.5. 11)  and  therefore 

Pi  =  ^ii8i  > 

since  is  non-singular.  Because  of  (A.5. 8)  one  has  finally 

x  =  k(R I1  M  Ht  b  .  (A.  10.2) 

In  analogy  to  (A.6.3)  we  write 

x  =  A+b  (A.  10.3) 


and  call  the  n  x  m  matrix 
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A+  =  k(R'J  (A.  10.4) 

the  pseudo-inverse  of  the  m  xn  matrix  A. 

The  matrix  A+  is  uniquely  determined  by  A  and  does  not  depend  on  the 
particular  orthogonal  decomposition  (A.5.5).  This  can  easily  be  seen  if  one 
denotes  the  j th  column  vector  of  A+  by  =  A+e7,  with  e;  the  j th  column 

vector  of  the  m-dimensional  unit  matrix.  According  to  (A.  10.3)  the  vector 

is  the  minimum-length  solution  of  the  equation  Aa"j“  =  e;,  and  is  therefore 

w  J  J 

unique. 

A.  11  Eigenvalues  and  Eigenvectors 

We  now  consider  the  eigenvalue  equation 


Gx  =  Ax 


(A.11.1) 


for  the  n  xn  matrix  G.  If  this  is  fulfilled,  then  the  scalar  A  is  called  the 
eigenvalue  and  the  n  -vector  x  the  eigenvector  of  G.  Clearly  the  eigenvector  x 
is  only  determined  up  to  an  arbitrary  factor.  One  can  choose  this  factor  such 
that  |x|  =  1. 

We  consider  first  the  particularly  simple  case  where  G  is  a  diagonal  ma¬ 
trix  with  non-negative  diagonal  elements, 


G  =  STS  —  S2 


\ 


(A.  11.2) 


which  can  be  expressed  as  the  product  of  an  arbitrary  diagonal  matrix  S  with 
itself, 


\ 


(A.  11.3) 


/ 


The  eigenvalue  equation  S2x  —  Ax  then  has  the  eigenvalues  sf  —  A,  and  the 
normalized  eigenvectors  are  the  basis  vectors  x,-  =  e, . 

In  place  of  S  we  now  set 


A  =  U  S  VT 


(A.  11.4) 


with  orthogonal  matrices  U  and  V  and 
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g  =  ata  = 


We  can  write  the  eigenvalue  equation  of  G  in  the  form 

Gx  =  Ax 


(A.  11.5) 

(A.  11.6) 


or 

VSTSVTx  =  Ax 
Multiplying  on  the  left  with  VT, 


STSVTx  =  AVtx 


5 


and  comparing  with  (A.l  1.1)  and  (A.l  1.2)  shows  that  G  has  the  same  eigen¬ 
values  A =  sf  as  S2,  but  has  the  orthogonally  transformed  eigenvectors 

Xj  =  e'i  =  Vti  .  (A.  1 1 .7) 

One  can  clearly  find  the  eigenvalues  and  eigenvectors  of  G  if  one  knows  the 
orthogonal  matrix  V  that  transforms  G  into  a  diagonal  matrix, 

VT  GV  =  ST  S  =  S2  .  (A.  11.8) 

The  transformation  (A.  11.8)  is  called  a  principal  axis  transformation. 
The  name  becomes  clear  by  considering  the  equation 

rTGr  =  1  .  (A.  11.9) 


We  are  interested  in  the  geometrical  position  of  all  points  r  that  fulfill 
(A.  11.9). 

For  the  vector  r  the  following  two  representations  are  completely  equiv¬ 
alent: 

r  =  J2  r‘  e>  (A.l  1.10) 

i 

and 

r  =  J2rie'i  ’  (A.l  1.11) 

i 

with  the  components  r(-  and  r'  taken  with  respect  to  the  basis  vectors  e(-  and 
e'j ,  respectively.  With  the  representation  (A.  11.11)  and  by  using  (A.  11.5)  and 
(A.l  1.7)  we  obtain 


1  = 


rTGr  =  rTyS2VTr 


=  y>,v,)Tvs2vTy>'e';) 


J 


=  y>!eTW,S2FTy>^e;) 


J 


=  J2(ri^)s2J2(r'  Jei') 


j 
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and  finally 

ErfA  =  E^/«,?  =  i  ■  < a.  ii.i2) 

i  —  \  i— 1 

This  is  clearly  the  equation  of  an  ellipsoid  in  n  dimensions  with  half-diameters 
in  the  directions  e'j  and  having  the  lengths 

1 

at  =  —  .  (A.  1 1.13) 

Si 

The  vectors 

a  i  =  e'i  /  si  =  V  e,-  /sj  (A.  1 1 . 1 4) 

are  the  principal  axes  of  the  ellipsoid.  They  have  the  directions  of  the  eigen¬ 
vectors  of  G.  Their  lengths  a\  =  1  /  ^jfif  are  determined  by  the  eigenvalues  sf. 
The  matrix 


C  =  G_1  =  (AtA)_1  =  V  (S2)-1  UT 

clearly  has  the  same  eigenvectors  as  G,  but  has  the  eigenvalues  1  /sf.  The 
lengths  of  the  half-diameters  of  the  ellipsoid  described  above  are  then  directly 
equal  to  the  square  roots  of  the  eigenvalues  of  C. 

The  matrix  C  is  called  the  unweighted  covariance  matrix  of  A.  The  el¬ 
lipsoid  is  called  the  unweighted  covariance  ellipsoid. 

Please  note  that  all  considerations  were  done  for  matrices  of  the  type 
(A.  1 1 .5)  which,  by  construction,  are  symmetric  and  non-negative  definite ,  i.e., 
they  have  real,  non-negative  eigenvalues.  Ellipses  with  finite  semiaxes  are 
obtained  only  with  positive-definite  matrices.  If  the  same  eigenvalue  occurs 
several  times  then  the  ellipsoid  has  several  principal  axes  of  the  same  length. 
In  this  case  there  is  a  certain  ambiguity  in  the  determination  of  the  principal 
axes  which,  however,  can  always  be  chosen  orthogonal  to  each  other. 

Up  to  now  we  have  not  given  any  prescription  for  finding  the  eigenvalues. 
The  eigenvalue  equation  is  Gx  =  Ax  or 

(G  —  A/)x  =  0  .  (A.  11.15) 

Written  in  this  way  it  can  be  considered  as  a  linear  system  of  equations  for 
determining  x,  for  which  the  right-hand  side  is  the  null  vector.  Corresponding 
to  (A.6.5)  and  (A.6.9)  there  is  a  non-trivial  solution  only  for 

det(G  —  A/)  =  0  .  (A.  11.16) 

This  is  the  characteristic  equation  for  determining  the  eigenvalues  of  A.  For 
n  —  2  this  is 


A.  12  Singular  Value  Decomposition 


379 


gll-k  gl2 
g21  g 22  ~  k 


—  (gll  ~k)  (g22~k)  ~gl2g21  =0 


and  has  the  solutions 


gll  +g22  , 
2 


g!2g21 + 


(gll  ~g22)2 
4 


If  G  is  symmetric,  as  previously  assumed,  i.e.,  g  12  =  g?i ,  then  the  eigenvalues 
are  real.  If  G  is  positive  definite,  they  are  positive. 

The  characteristic  equation  of  an  n  x  n  matrix  has  n  solutions.  In  practice, 
however,  for  n  >2  one  does  not  find  the  eigenvalues  by  using  the  characteris¬ 
tic  equation,  but  rather  by  means  of  an  iterative  procedure  such  as  the  singular 
value  decomposition  (see  Sect.  A.  12). 


A.12  Singular  Value  Decomposition 

We  now  consider  a  particular  orthogonal  decomposition  of  the  m  x  n  ma¬ 
trix  A, 

A  =  USVt  .  (A. 12.1) 

Here  and  in  Sects.  A.13  and  A.14  we  assume  that  m  >  n.  If  this  is  not  the  case, 
then  one  can  simply  extend  the  matrix  A  with  further  rows  whose  elements 
are  all  zero  until  it  has  n  rows,  so  that  m  =  n.  The  decomposition  (A.  12.1)  is 
a  special  case  of  the  decomposition  (A.5.5).  The  mxn  matrix  S,  which  takes 
the  place  of  R,  has  the  special  form 

S=(q  0)  ’  (AT2.2) 

and  D  is  a  k  x  k  diagonal  matrix  with  k  =  Rang(A).  The  diagonal  elements 
of  S  are  called  the  singular  values  of  A.  If  A  is  of  full  rank,  then  k  =  n  and 
all  Si  /  0.  For  reduced  rank  k  <  n  one  has  si  =  0  for  /  >  k.  We  will  see  below 
that  U  and  V  can  be  determined  such  that  all  Sj  are  positive  and  ordered  to  be 
non-increasing, 

si  >  S2  >  . . .  >  Sk  .  (A.  12.3) 

The  singular  values  of  A  have  a  very  simple  meaning.  From  Sect.  A.ll 
one  has  directly  that  the  Sj  are  the  square  roots  of  the  eigenvalues  of  G  =  ATA. 
Thus  the  half-diameters  of  the  covariance  ellipsoid  of  G  are  =  1  /sj .  If  G 
is  singular,  then  A  has  the  reduced  rank  k  <  n,  and  the  n—k  singular  values 
s^ 1-1,  . . .,  sn  vanish.  In  this  case  the  determinant 

det  G  =  det  U  det  S 2  det  V  =  det  S 2  =  s\s\  •  ■  •  sft  (A.  12.4) 


also  vanishes. 
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With  the  substitutions  H 
from  Sect.  A.5 


U,  K  — ^  V,  R  — ^  S,  Rn  —>■  D  we  obtain 


With 


one  has 


i.e., 


and 


Ax  =  U  S  VTx  «  b  , 
S  VTx  «  t/Tb  . 


(A. 12.5) 
(A.  12.6) 


yTx  =  p=(  Pl  )  J  k  ,  UTb  =  g=(  gl  )  J  k  (A.  12.7) 

P2  /  }  n-k  V  g2  /  >  m-k 


These  have  the  solutions 


Sp  =  g  , 


£>Pi  =  gi 


o  •  p2  =  g2  • 


Pi  =  D  , 


(A.  12.8) 


(A. 12.9) 


(A.  12. 10) 


(A.  12. 11) 


i.e., 


Pi  =  gl/ si  ,  1=  1,2, ...  ,k  , 


for  arbitrary  p2.  The  solution  with  minimum  absolute  value  is 


x=  V 


The  residual  vector  has  the  form 


r  =  b  —  Ax  =  U 


and  the  absolute  value 


r  =  |g2l  = 


(A.  12. 12) 


(A.12.13) 


(A.  12. 14) 


A.13  Singular  Value  Analysis 

The  rank  k  of  the  matrix  A  plays  a  decisive  role  in  finding  the  solution  of 
Ax  ~  b.  Here  k  characterizes  the  transition  from  non-zero  to  vanishing  sin¬ 
gular  values,  Sk  >  0,  .s>+i  =  0.  How  should  one  judge  the  case  of  very  small 
values  of  Sk,  i.e.,  Sk  <  £  for  a  given  small  e? 
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Example  A.2:  Almost  vanishing  singular  values 

As  a  simple  example  we  consider  the  case  m  =  n  =  2,  U  =  V  =  /.  One  then 
has 


Ax  —  US  VTx  =  Sx  = 


,Si  0 

0  S2 


x\ 

X2 


(A. 13.1) 


that  is, 


X\=b\/S\  ,  X2=b2/S2 


(A.  13.2) 


If  we  now  take  s2  — ►  0  in  (A.13.2),  then  \x2\  — >  oo.  At  first  glance  one  obtains 
a  completely  different  picture  if  one  sets  s2  =  0  in  (A.  13.1)  directly.  This 
gives 

s\x\—b\  ,  0-x2  —  b2  .  (A.  13.3) 


Thus  x\  =  b i  /.s' i  as  in  (A.13.2),  but  x2  is  completely  undetermined.  The  solu¬ 
tion  x  of  minimum  absolute  value  is  obtained  by  setting  x2  =  0.  The  question 
is  now,  “What  is  right?  x2  =  oo  or  x2  =  0?”  The  answer  is  that  one  should  set 


b2/S2  ,  S2>£ 

0 ,  s2  <  e 


and  choose  the  parameter  e  such  that  the  expression  b2 / e  is  still  numerically 
well-defined.  This  means  that  one  must  have  e/\b2\  2~m  if  m  binary  digits 

are  available  for  the  representation  of  a  floating  point  number  (cf.  Sect.  4.2). 
Thus  a  finite  value  of  b?  is  computed  as  long  as  the  numerical  determination 
of  this  value  is  reasonable.  If  this  is  not  the  case  then  one  approaches  the 
situation  s2  =  0,  where  b2  is  completely  undetermined,  and  one  sets  b2  =  0.  ■ 


Example  A.3:  Point  of  intersection  of  two  almost  parallel  lines 
We  consider  the  two  lines  shown  in  Fig.  A.6,  which  are  described  by 

— axi  +x2  —  l— a  , 
axi+x2  —  l+o: 

For  the  vector  x  of  the  intersection  point  one  has  Ax  =  b  with 


A  = 


a 

a 


1 

1 


b  = 


1  —  a 
l+O! 


One  can  easily  check  that  A  =  U  SVT  holds  with 


U  = 


1  /  -1  -1 


V2V-1 


1 


S  =  V: 2 


1  0 

0  Q! 


V  = 


0  1 

-1  0 


i.e.,  si  =  V2,  s2  =  ccv  2.  Using 


382 


A  Matrix  Calculations 


Fig.A.6  :  Two  lines  intersect  at  the  point  (1,1)  with  an  angle  2  y  =  2arctano'. 


g  =  ur  b  =  V2(^)  =  5p  =  V2(^) 


one  obtains 


\  =  \  p  = 


independent  of  a . 

If,  however,  one  has  S2  =  av2  <  e  and  then  sets  S2  =  0,  then  one  obtains 


From  Fig.  A.6  one  can  see  that  for  a  — ►  0  the  two  lines  come  together  to  a 
single  line  described  by  %2  =  1.  The  x\ -coordinate  of  the  “intersection  point” 
is  completely  undetermined.  It  is  set  equal  to  zero,  since  the  solution  vector  x 
has  minimum  length  [cf.  (A.5.14)]. 

As  in  the  case  of  “indirect  measurements”  in  Chap.  9  we  now  assume 
that  the  vector  b  is  equal  to  the  vector  of  measurements  y,  which  characterize 
the  two  lines,  and  that  their  measurement  errors  are  given  by  the  covariance 
matrix  Cy  =  G~l.  In  the  simplest  case  of  equal  uncorrelated  errors  one  has 

Cy  =  G~l  =  a2 1  . 


The  covariance  matrix  for  the  unknowns  x  is  according  to  (9.2.27) 

Cx  =  (ATGyA)~1  , 

and  thus  in  the  case  of  uncorrelated  measurement  errors, 
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Cx  =  a2(ATA)_1  =  o2C 


Up  to  the  factor  o 2  it  is  thus  equal  to  the  unweighted  covariance  matrix  C  = 
(ATA)_1  for  the  matrix  A.  For  our  matrix  A  one  has 


C  =  (AtA)-1 


The  corresponding  ellipse  has  the  half-diameters  ei/av2  and  e2/v2.  The 
covariance  ellipse  of  x  then  has  for  the  case  of  equal  uncorrelated  measure¬ 
ments  equal  half-diameters,  multiplied,  however,  by  the  factor  a.  They  have 
the  lengths  oXl  =  cr/aV2,  oX2  =  cr/V2.  Clearly  one  then  sets  x\  =  0  if  the 
inequality  x\  oXl  would  hold  for  a  finite  fixed  x i ,  i.e.,  a  v2  «ff.  i 


The  decision  as  to  whether  a  small  singular  value  should  be  set  equal  to 
zero  thus  depends  on  numerical  considerations  and  on  a  consideration  of  the 
measurement  errors.  The  following  fairly  general  procedure  of  singular  value 
analysis  has  proven  to  be  useful  in  practice. 

1 .  With  a  computer  program  one  carries  out  the  singular  value  decompo¬ 
sition,  which  yields,  among  other  things,  the  ordered  singular  values 


S\  >S2>--->Sk 


2.  Depending  on  the  problem  at  hand  one  chooses  a  positive  factor  /  <$C  1 . 

3.  All  singular  values  for  which  si  <  fs\  are  set  equal  to  zero.  In  place  of 
k  one  has  thus  t  <  k  such  that  Sj  =  0  for  i  >  t. 

4.  With  the  replacements  described  above  ( k  — »•  l,  S£+l  =  •  •  •  =  sk  =  0) 
the  formulas  of  Sect.  A.  12  retain  their  validity. 

In  place  of  the  residual  (A.  12. 14)  one  obtains  a  somewhat  larger  value, 
since  in  the  sum  in  the  expression 


r  = 


(A.  13.4) 


one  has  more  terms  than  in  (A.  12. 14). 

The  procedure  implies  that  in  some  cases  one  has  an  effective  reduction 
of  the  rank  k  of  the  matrix  A  to  a  value  t  <  k.  If  A  has  its  full  rank,  then  this 
can  be  reduced.  This  has  the  great  advantage  that  numerical  difficulties  with 
small  singular  values  are  avoided.  In  contrast  to,  for  example,  the  Gaussian 
or  Cholesky  procedures,  the  user  does  not  have  to  worry  about  whether  G  = 
AT  A  is  singular  or  nearly  singular. 
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Although  the  singular  value  analysis  always  gives  a  solution  of  minimum 
absolute  value  x  for  the  problem  Ax  b,  caution  is  recommended  for  the  case 
l  <  n  (regardless  of  whether  k  <  n  or  l  <  k  =  n).  In  such  a  case  one  has  an 
(almost)  linear  dependence  of  the  unknowns  x\,  xn.  The  solution  x  is  not 
the  only  solution  of  the  problem,  but  rather  is  simply  the  solution  with  the 
smallest  absolute  value  out  of  the  many  possible  solutions. 

We  have  already  remarked  in  Sect.  A.5  that  the  singular  value  decom¬ 
position  is  advantageous  also  with  respect  to  numerical  precision  compared 
to  other  methods,  especially  to  the  method  of  normal  equations.  A  detailed 
discussion  (see,  e.g.,  [18])  is  beyond  the  scope  of  this  book.  Here  we  limit 
ourselves  to  giving  an  example. 


Example  A.4:  Numerical  superiority  of  the  singular  value  decomposition 
compared  to  the  solution  of  normal  equations 

Consider  the  problem  Ax  ~  b  for 


The  singular  values  of  A  are  the  square  roots  of  the  eigenvalues  of 


G  =  AtA  - 


and  are  determined  by  (A.  1 1 . 1 6)  to  be 

S1=y/2+P2  ,  *2=101 


This  was  done  with  singular  value  decomposition  without  using  the  matrix  G. 
If  the  computing  precision  is  e  and  if  ft2  <  e  but  ft  >  s,  then  one  obtains 


si=V2  ,  *2=101 


i.e.,  both  singular  values  remain  non-zero.  If  instead  of  the  singular  value 
decomposition  one  uses  the  normal  equations 

AtAx  =  Gx  —  ATb  , 

then  the  matrix  G  appears  explicitly.  With  ft2  <  s  it  is  numerically  repre¬ 
sented  as 


This  matrix  is  singular,  detG  =  0,  and  cannot  be  inverted,  as  foreseen  in 
(A.5. 17).  This  is  also  reflected  in  its  singular  values 

*1  =  a/2 


*2  =  0  .  ■ 
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A.14  Algorithm  for  Singular  Value  Decomposition 

A.  14.1  Strategy 

We  will  now  give  an  algorithm  for  the  singular  value  decomposition  and  fol¬ 
low  for  the  most  part  the  presentation  of  Lawson  and  Hanson  [18],  which 
is  based  on  the  work  of  Golub  and  Kahn  [19],  Businger  and  Golub  [20], 
and  Golub  and  Reinsch  [21].  The  strategy  of  the  algorithm  is  based  on  the 
successive  application  of  orthogonal  transformations. 

In  th e  first  step,  the  matrix  A  is  transformed  into  a  bidiagonal  matrix  C, 

A  —  QC  Ht  .  (A.  14.1) 

The  orthogonal  matrices  Q  and  H  are  themselves  products  of  Househol- 
der"=transformation  matrices. 

In  step  2  the  matrix  C  is  brought  into  diagonal  form  by  means  of  an 
iterative  procedure: 

C  =  U'S'V'T  .  (A.  14.2) 

Here  the  matrices  U'  and  V'  are  given  by  products  of  Givens-transformation 
matrices  and  if  necessary  reflection  matrices.  The  latter  ensure  that  all  diago¬ 
nal  elements  are  non-negative. 

In  step  3,  further  orthogonal  matrices  U"  and  V "  are  determined.  They 
are  products  of  permutation  matrices  and  ensure  that  the  diagonal  elements  of 

S  =  U"TS'V"  (A.  14.3) 

are  ordered  so  as  to  be  non-increasing  as  in  (A.  12.3).  As  the  result  of  the  first 
three  steps  one  obtains  the  matrices 

U  =  QU'U”  ,  V  =  HV'V "  ,  (A.  14.4) 

which  produce  the  singular  value  decomposition  (A.  12.1)  as  well  as  the  diag¬ 
onal  elements  sq,  S2, . . . ,  s/; . 

In  step  4  the  singular  value  analysis  is  finally  carried  out.  Stated  simply, 
all  singular  values  are  set  to  zero  for  which 

V  <  Anin  •  (A.  14.5) 

Thus  one  can  finally  give  the  solution  vector  x  to  the  equation 

Ax^b  .  (A.  14.6) 

In  practice,  Eq.  (A.  14.6)  is  often  solved  simultaneously  for  various  right- 
hand  sides,  which  can  then  be  arranged  in  an  m  x  l  matrix, 
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B  =  (bi,b2,...,b*) 


Instead  of  the  solution  vector  x  we  thus  obtain  the  n  x  l  solution  matrix 


X  =  (xi,  x2, . . . ,  x^) 

of  the  equation 

AX^B  .  (A.  14.7) 

The  method  DatanMatrix  singularValuedecomposition 
computes  this  solution.  It  merely  consists  of  four  calls  to  other  methods  that 
carry  out  the  four  steps  described  above.  These  are  explained  in  more  detail 
in  the  following  sections. 

Usually  one  is  only  interested  in  the  solution  matrix  X  of  the  problem 
AX  ^  B  and  possibly  in  the  number  of  singular  values  not  set  to  zero.  Some¬ 
times,  however,  one  would  like  explicit  access  to  the  matrices  U  and  V  and 

to  the  singular  values.  This  is  made  possible  by  the  method  DatanMa- 
trix.pseudolnverse. 


A. 14.2  Bidiagonalization 


The  method  DatanMatrix.svl  performs  the  procedure  described  below. 

(It  is  declared  as  privatewithin  the  class  DatanMatrix  as  are  the  further 
methods  referred  to  in  Sect.  A.  14;  of  course  their  source  code  can  be  studied.) 
The  goal  is  to  find  the  mxn  matrix  C  in  (A.  14.1).  Here  C  is  of  the  form 


and  C'  is  an  n  x  n  bidiagonal  matrix 


/  d\  e2 

d2  <?3 


\ 


(A.  14.8) 


(A.  14.9) 


The  goal  is  achieved  by  multiplication  of  the  matrix  A  with  appropriate 
Householder  matrices  (alternating  on  the  right  and  left): 

C  =  Qn(-  ■  ■  ((QiA)H2)  ■  ■  ■  Hn)  —  QT A  H  .  (A.  14.10) 


The  matrix  Q\  is  computed  from  the  elements  of  the  first  column  of  A 
and  is  applied  to  all  columns  of  A  (and  B).  It  results  in  only  the  element 
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(1, 1)  of  A  remaining  non-zero.  The  matrix  H2  is  computed  from  the  first  row 
of  the  matrix  QiA.  It  results  in  the  element  (1,1)  remaining  unchanged,  the 
element  (1,2)  being  recomputed  and  the  elements  (1,3),  (1  ,n)  being  set 

equal  to  zero.  It  is  applied  to  all  rows  of  the  matrix  (Q\A).  One  thus  obtains 


QiA 


\ 

/ 


Q1AH2 


Here  denotes  an  element  of  the  final  bidiagonal  matrix  C',  and  an 
element  that  will  be  changed  further.  Now  Q2  is  determined  from  the  second 
column  of  Q\AH2  such  that  upon  application  to  this  column  the  element  1 
remains  unchanged  and  the  element  2  and  all  others  are  changed  such  that 
only  element  2  remains  non-zero,  and  the  elements  3  through  m  become  zero. 

The  procedure  can  be  summarized  in  the  following  way.  The  matrix  Qj 
is  applied  to  the  column  vectors,  it  leaves  the  elements  1  through  i  —  1  un¬ 
changed  and  changes  the  elements  i  through  m.  It  produces  an  orthogonal 
transformation  in  the  subspace  of  the  components  i  through  m.  At  the  time  of 
applying  Qi,  however,  these  elements  in  the  columns  1  through  i  —  1  are  al¬ 
ready  zero,  so  that  Qi  must  only  be  applied  explicitly  to  columns  i  through  n. 
The  corresponding  considerations  hold  for  the  matrix  Hi .  It  acts  on  the  row 
vectors,  leaves  elements  1  through  i  —  1  unchanged,  and  produces  an  orthog¬ 
onal  transformation  in  the  subspace  of  the  components  i  through  n.  When 
it  is  applied,  these  components  in  the  rows  1  through  i  —  2  are  all  zero.  The 
matrix  Hi  must  only  be  applied  explicitly  to  the  rows  /  —  1, . . .,  m  —  1.  Since 
only  n  —  1  rows  of  A  must  be  processed,  the  matrix  Hn  in  (A.  14. 10)  is  the  unit 
matrix;  Qn  is  the  unit  matrix  only  if  m  —  n . 

In  addition  to  the  transformed  matrix  Q[A  H,  the  matrix  H  —  H2H2  ■  ■  ■ 
Hn- 1  is  also  stored.  For  this  the  information  about  each  matrix  H,  is  saved. 
As  we  determined  at  the  end  of  Sect.  A.3,  it  is  sufficient  to  store  the  quantities 
defined  there,  up  and  b,  and  the  elements  i  +  1  through  n  of  the  (i  —  l)th  row 
vectors  defining  the  matrix  Hi .  This  can  be  done,  however,  in  the  array  ele¬ 
ments  of  these  row  vectors  themselves,  since  it  is  these  elements  that  will  be 
transformed  to  zero  and  which  therefore  do  not  enter  further  into  the  calcula¬ 
tion.  One  needs  to  declare  additional  variables  only  for  the  quantities  up  and  b 
of  each  of  the  matrices  H, .  If  all  of  the  transformations  have  been  carried  out, 
then  the  diagonal  elements  dj  and  the  next-to-diagonal  elements  <?,-  are  trans¬ 
ferred  to  the  arrays  e  and  d.  Finally,  the  product  matrix  H  =  H2H3  ■  ■  ■  Hn-\I 
is  constructed  in  the  first  n  rows  of  the  array  of  the  original  matrix  A,  in  the 

order  Hn-\I,  Hn-2(Hn-\I), _ Here  as  well,  the  procedure  is  as  economical 

as  possible,  i.e.,  the  Householder  matrix  is  only  applied  to  those  columns  of 
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the  matrix  to  the  right  for  which  it  would  actually  produce  a  change.  The  unit 
matrix  7  is  constructed  to  the  extent  that  it  is  needed,  row-by-row  in  the  array 
of  the  original  matrix  A  starting  from  below.  For  this  there  is  exactly  the  right 
amount  of  space  available,  which  up  to  this  point  was  necessary  for  storing 
information  about  the  Householder  transformations  that  were  just  applied. 


A.14.3  Diagonalization 


This  step  is  implemented  in  DatanMatrix.sv2.  The  bidiagonal  matrix 
C,  whose  non- vanishing  elements  are  stored  in  the  arrays  d  and  e,  is  now 
brought  into  diagonal  form  by  appropriately  chosen  Givens  transformations. 
The  strategy  is  chosen  such  that  the  lowest  non-diagonal  element  vanishes 
first  and  the  non-diagonal  elements  always  move  up  and  to  the  left,  until  C 
is  finally  diagonal.  All  of  the  transformations  applied  to  C  from  the  left  are 
also  applied  to  the  matrix  stored  in  the  array  b,  and  all  transformations  that 
act  from  the  right  are  also  applied  to  the  matrix  in  a.  (We  denote  the  matrix 
to  be  diagonalized  during  each  step  by  C.) 

Only  the  upper-left  submatrix  Ck  with  k  rows  and  columns  is  not  yet 
diagonal  and  must  still  be  considered.  The  index  k  is  determined  such  that 
e/c  7^  0  and  ej  =  0,  j  >  k.  This  means  that  the  program  runs  through  the  loop 
k  =  n,  n  —  1,  ...,  2.  Before  the  lower  non-diagonal  element  is  systemati¬ 
cally  made  zero  by  means  of  an  iterative  procedure,  one  checks  for  two  spe¬ 
cial  cases,  which  allow  a  shortening  of  the  computation  by  means  of  special 
treatment. 

Special  case  l,dk—0  (handled  in  DatanMatrix.s21):  A  Givens  ma¬ 
trix  W  is  applied  from  the  right,  which  also  causes  ek  to  vanish.  The  matrix 
w  is  the  product  Wk~\Wk~2  •  •  •  W\.  Here  Wj  acts  on  the  columns  i  and  k  of 
Ck,  but  of  course  only  on  those  rows  where  at  least  one  of  these  columns  has 
a  non- vanishing  element.  Wk-i  acts  on  the  row  k  —  1,  annihilates  the  ele¬ 
ment  ek  =  Ck~i,k,  and  changes  Ck-\,k-l-  In  addition,  Wk-i  acts  on  the  row 
k  —  2,  changes  the  element  Ck-2,k-l,  and  produces  a  non-vanishing  element 
H  =  Ck~2,k  in  column  k.  Now  the  matrix  Wk-2  is  applied,  which  annihilates 
exactly  this  element,  but  produces  a  new  element  in  row  k  —  3  and  column  k. 
When  the  additional  element  finally  makes  it  to  row  1 ,  it  can  then  be  annihi¬ 
lated  by  the  transformation  W\.  As  a  result  of  this  treatment  of  special  case 
1,  Ck  decomposes  into  a  (k  —  1)  x  (k  —  1)  submatrix  Ck- 1  and  a  1  x  1  null 
matrix. 

Special  case  2,  Ck  decomposes  into  submatrices  (handled  in  DatanMa- 
trix.s22):  If  e  =  Q  for  any  value  t,  2  <  i  <  k,  then  the  matrix 
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/  d\  e2 


\ 


0 

di  ei+i 


V 


dk- l  ek  I 
dk  / 


can  be  decomposed  into  two  matrices  Q_i  and  C.  One  can  first  diagonalize 
C  and  then  C<  _i .  In  particular,  if  £  =  k,  then  one  obtains 


Ck-i  0  \ 

0  dk  ) 


Here  dk  is  the  singular  value,  and  the  loop  index  can  only  be  decreased  by 
one,  k  — k  —  1 .  First,  however,  the  case  dk  <  0  must  be  treated;  see  “ change 
of  sign”  below.  If  di-\  =  0,  but  et  /  0,  then  C  can  still  be  decomposed.  For 
this  one  uses  the  transformation  matrix  T  =  TkJk- 1,  •  •  • ,  Tt+ 1  acting  on  the 
left,  where  7}  acts  on  the  rows  t  and  i.  In  particular,  Ti+\  annihilates  the 
element  ei  =  Co'+l  and  creates  an  element  H  =  Tl+2  annihilates 

this  element  and  creates  in  its  place  H  =  3.  Finally  7)-  annihilates  the 

last  created  element  H  =  C«-  by  a  transformation  to  Ckk  =  dk- 

After  it  has  been  checked  whether  one  or  both  of  the  special  cases  were 
present  and  such  an  occurrence  has  been  treated  accordingly,  it  remains  only 
to  diagonalize  the  matrix  C.  This  consists  of  the  rows  and  columns  £  through 
k  of  C.  If  special  case  2  was  not  present,  then  one  has  £  =  1.  The  problem  is 
solved  iteratively  with  the  QR-algorithm. 

QR-algorithm  (carried  out  in  DatanMatrix.s23):  First  we  denote  the 
square  output  matrix  C  by  C\.  We  then  determine  orthogonal  matrices  Ut,  Vi 
and  carry  out  transformations 

Q+i  =  Uj  Ci  Vi  ,  i  =  1,2,...  , 


which  lead  to  a  diagonal  matrix  S, 

lim  Q  =  S  . 

i-*.  00 

The  following  prescription  is  used  for  determining  [/,  and  Vj : 

(A)  One  determines  the  eigenvalues  X2  of  the  lower-right  2x2  sub¬ 
matrix  of  CjCj.  Here  cr(-  is  the  eigenvalue  closest  to  the  lower-right 
element  ofC,T  Ci . 

I  1 
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(B)  The  matrix  V/  is  determined  such  that  V,T ( Cj Cj  —  cr,  7 )  has  upper  tri- 

L  L 

angular  form. 

(C)  The  matrix  [/,  is  determined  such  that  C;+ 1  =  Uj  Ci  Vi  is  again  bidiag- 
onal. 

The  matrix  V/  from  step  (B)  exists,  according  to  a  theorem  by  FRANCIS  [22], 
if 


(a)  Cj  Ci  is  tridiagonal  with  non- vanishing  subdiagonal  elements, 

(b)  Vi  is  orthogonal, 

(c)  (7j  is  an  arbitrary  scalar, 

(d)  V}  (C}C j)Vj  is  tridiagonal,  and 

(e)  The  first  column  of  V?(Cj Ci  —  07 /)  except  the  first  element  vanishes. 

The  requirement  (a)  is  fulfilled,  since  Cj  is  bidiagonal  and  the  special  case 
2  has  been  treated  if  necessary;  (b)  is  also  fulfilled  by  constructing  Vj  as  the 
product  of  Givens  matrices.  This  is  done  in  such  a  way  that  simultaneously 
(d)  and  (e)  are  fulfilled.  In  particular  one  has 

Vi  —  R1R2  ■  ■  ■  Rn-i  >  uj  —  Ln-\Ln-  ■  ■  L\  , 


where 


Rj  acts  on  the  columns  j  and  j  +  1  of  C, 

Lj  acts  on  the  rows  j  and  /  +  1  of  C, 

R\  is  determined  such  that  requirement  (e)  is  fulfilled, 

L 1,  R2,  L2,  . . .  are  determined  such  that  (e)  is  fulfilled  without 
violating  (d). 

For  <7j  one  obtains 


(Ji  =d„+e 


n 


with 


and 


t  =  f+V r+75  ,  /> 0 

t  =  f~v/l  +  f2  ,  /<  0 


/  = 


-f- 

an  an-\^tn  cn-l 

2endn-\ 


The  first  column  of  the  matrix  (CTC,  —  cr,  /)  is 

L 
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/  dj  ~  07  \ 
die2 

0 

l  .  J 

One  determines  the  matrix  R\,  which  defines  a  Givens  transformation,  such 
that  all  elements  of  the  first  column  of  /?*  ( Cj C,  —(7,1)  except  the  first  vanish. 
Application  of  R\  on  C/  produces,  however,  an  additional  element  H  =  C21, 
so  that  Ci  is  no  longer  bidiagonal, 


\ 


V 


CiR  t 


V 


\ 


By  application  of  L\,  this  element  is  projected  onto  the  diagonal.  In  its  place 
a  new  element  H  =  C13  is  created, 


\ 


L\CiR\  = 


V 


By  continuing  the  procedure,  the  additional  element  is  moved  further  down 
and  to  the  right,  and  can  be  completely  eliminated  in  the  last  step: 


(■  ■  \  (■  ■  \ 


V  »  '  \ 


If  the  lower  non-diagonal  element  of  C;+i  is  already  zero,  then  the  lower 
diagonal  element  is  already  a  singular  value.  Otherwise  the  procedure  is  re¬ 
peated,  whereby  it  is  first  checked  whether  now  one  of  the  two  special  cases 
is  present.  The  procedure  typically  converges  in  about  2k  steps;  (k  is  the  rank 
of  the  original  matrix  A).  If  convergence  has  still  not  been  reached  after  10k 
steps,  then  the  algorithm  is  terminated  without  success. 

Change  of  sign.  If  a  singular  value  dk  has  been  found,  i.e.,  if  ek  —  0, 
then  it  is  checked  whether  it  is  negative.  If  this  is  the  case,  then  a  simple 
orthogonal  transformation  is  applied  that  multiplies  the  element  dk  —  Ckk  of 
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C  and  all  elements  in  the  kth  column  of  the  matrix  contained  in  a  by  —  1 .  The 
index  k  can  then  be  reduced  by  one.  Corresponding  to  (A.  12.6)  the  matrix  B 
is  multiplied  on  the  left  by  UT .  This  is  done  successively  with  the  individual 
factors  making  up  UT . 


A.  14.4  Ordering  of  the  Singular  Values  and  Permutation 

By  the  method  DatanMatrix.sv3  the  singular  values  are  put  into  non¬ 
increasing  order.  This  is  done  by  a  sequence  of  permutations  of  neighboring 
singular  values,  carried  out  if  the  singular  value  that  follows  is  larger  then  the 
preceding.  The  matrices  stored  in  a  and  b  are  multiplied  by  a  corresponding 
sequence  of  permutation  matrices;  cf.  (A.  14.4). 


A.14.5  Singular  Value  Analysis 

In  the  last  step  the  singular  value  analysis  is  carried  out  as  described  in 
Sect.  A.  13  by  the  method  DatanMatrix.sv4.  For  a  given  factor  /  1 

a  value  l  <  k  is  determined  such  that  si  <  fs  i  for  i  >  l.  The  columns  of 
the  array  B,  which  now  contains  the  vectors  g,  are  transformed  in  their  first  i 
elements  into  p,  according  to  (A.  12.1 1),  and  the  elements  i  + 1, . . .,  n  are  set 
equal  to  zero.  Then  the  solution  vectors  x,  which  make  up  the  columns  of  the 
solution  matrix  X,  are  computed  according  to  (A.  12. 12). 


A.15  Least  Squares  with  Weights 

Instead  of  the  problem  (A.5.3), 

r2  =  (Ax  —  b)T  ( 4  x  —  b )  =  min  ,  (A.  15.1) 

one  often  encounters  a  similar  problem  that  in  addition  contains  a  positive- 
definite  symmetric  weight-matrix  Gmxm, 

r2  =  (Ax  —  b)TG(Ax  —  b)  =  min  .  (A.  15.2) 

In  (A.  15. 1)  one  simply  has  G  =  I.  Using  the  Cholesky  decomposition  (A.9. 1) 
of  G,  i.e.,  G  =  UTU,  one  has 

r2  =  (Ax  —  b)TUTU(Ax  —  b)  =  min  .  (A.  15.3) 


With  the  definitions 


A'  =  U  A 


b  =  Ub 


(A.  15.4) 
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Eq.  (A.  15.3)  takes  on  the  form 

r2  =  (A'x  —  b/)T(A/x  —  b')  =  min  .  (A.  15.5) 

After  the  replacement  (A.  15.4),  the  problem  (A.  15.2)  is  thus  equivalent  to  the 
original  one  (A.  15.1). 

In  Sect.  A.  1 1  we  called  the  n  x  n  matrix 

C  =  (ATA)_1  (A.  15.6) 

the  unweighted  covariance  matrix  of  the  unknowns  x  in  the  Problem  (A.  15.1). 
In  Problem  (A.  15.2),  the  weighted  covariance  matrix 

Cx  =  (A/TA')_1  =  (AtG  A)-1  (A.  15.7) 


appears  in  its  place. 


A.  16  Least  Squares  with  Change  of  Scale 

The  goal  in  solving  a  problem  of  the  type 

(Ax  — b)T(Ax  — b)  =  min  (A.  16.1) 


is  the  most  accurate  numerical  determination  of  the  solution  vector  x  and  the 
covariance  matrix  C.  A  change  of  scale  in  the  elements  of  A,  b,  and  x  can 
lead  to  an  improvement  in  the  numerical  precision. 

Let  us  assume  that  the  person  performing  the  problem  already  has  an 
approximate  idea  of  x  and  C,  which  we  call  z  and  K.  The  matrix  K  has  the 
Cholesky  decomposition  K  =  LTL.  By  defining 

A'  —  AL  ,  b  =  b  —  A  z  ,  x'  =  L~1(x  —  z)  ,  (A. 16.2) 

Eq.  (A.  16.1)  becomes 


(A'x'  -  b/)T(A/x/  -  b)  =  min 


The  meaning  of  the  new  vector  of  unknowns  x  is  easily  recognizable  for 
the  case  where  K  is  a  diagonal  matrix.  We  write 


a2 


\ 
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where  the  quantities  of  are  the  estimated  values  of  the  variances  of  the  un- 
knowns  x, .  Thus  the  i  th  component  of  the  vector  x'  becomes 

,  Xi  -  Zi 


If  in  fact  one  has  %  =  n  and  the  corresponding  variance  of,  then  x[  =  0  and 
has  a  variance  of  one.  If  the  estimates  are  at  least  of  the  correct  order  of 
magnitude,  the  x-  are  close  to  zero  and  their  variances  are  of  order  unity.  In 
addition,  in  case  the  full  matrix  is,  in  fact,  estimated  with  sufficient  accuracy, 
the  components  of  x'  are  not  strongly  correlated  with  each  other. 

In  practice,  one  carries  out  the  transformation  (A.  16.2)  only  in  excep¬ 
tional  cases.  One  must  take  care,  however,  that  “reasonable”  variables  are 
chosen  for  (A.  16.2).  This  technique  is  applied  in  the  graphical  representation 
of  data.  If  it  is  known,  for  example,  that  a  voltage  U  varies  in  the  region 
a  =  10  mV  about  the  value  Uq  =  1  V,  then  instead  of  U  one  would  plot  the 
quantity  U'  =  (U  —  Uq)/o,  or  some  similar  quantity. 


A.17  Modification  of  Least  Squares  According 
to  Marquardt 


Instead  of  the  problem 

(Ax  —  b)T(Ax  —  b)  =  min  ,  (A.17.1) 

which  we  have  also  written  in  the  shorter  form 


Ax-b  ss  0 


(A.  17.2) 


let  us  now  consider  the  modified  problem 

m  i  ! m 

n  {  \Xl)  \0j  }  n 

A— ^ 


(A.  17.3) 


Here  I  is  the  n  x  n  unit  matrix  and  A.  is  a  non-negative  number.  The  modi¬ 
fied  problem  is  of  considerable  importance  for  fitting  nonlinear  functions  with 
the  method  of  least  squares  (Sect.  9.5)  or  for  minimization  (Sect.  10.15).  For 
A  =  0  Eq.  (A.  17.3)  clearly  becomes  (A.  17.2).  If,  on  the  other  hand,  A  is  very 
large,  or  more  precisely,  if  it  is  large  compared  to  the  absolute  values  of  the 
elements  of  A  and  b,  then  the  last  “row”  of  (A.  17.3)  determines  the  solution 
x,  which  is  the  null  vector  for  A  — »■  oo. 

We  first  ask  which  direction  the  vector  x  has  for  large  values  of  A.  The 
normal  equations  corresponding  to  (A.17. 3),  cf.  (A.5.16),  are 
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(A1,  A/)(  ^  Jx  =  (ATA  +  A2/)x  =  (At,  A/)(b  I  =  A'b 


with  the  solution 


x  =  (ATA  +  A2/)_1ATb 


For  large  A  the  second  term  in  parentheses  dominates,  and  one  obtains  simply 

x  =  A_2ATb  . 


That  is,  for  large  values  of  X,  the  solution  vector  tends  toward  the  direction  of 
the  vector  ATb. 

We  will  now  show  that  for  a  given  X  the  solution  x^  to  (A.  17.3)  can 
easily  be  found  with  the  singular  value  decomposition  simultaneously  with  the 
determination  of  the  solution  x  of  (A.  17.2).  The  singular  value  decomposition 
(A.12.1)  of  A  is 

A-u©vT  ' 

By  substituting  into  (A.  17.3)  and  multiplying  on  the  left  we  obtain 

CT  °\A(o)rT\  AT  o\/b 

o  Vy \  xi  )  t.0  vTj\ o 


where,  using  the  notation  as  in  Sect.  A.  12, 

p  =  Ttx  ,  g  =  UTb 


(A.  17.4) 


By  means  of  Givens  transformations,  the  matrix  on  the  left-hand  side  of 
(A.  17.4)  can  be  brought  into  diagonal  form.  One  obtains 


with 


W 


W 


w 

w 


giSi 


? 


? 
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and  thus 


(A) 

Pi 

giSi 

~  sf  +  A2 

(A) 

Pi 

II 

o 

The  solution  of  (A.  17.3)  is  then 


The  method  DatanMatrix.marquardt  computes  these  solution  vec¬ 
tors  for  two  values  of  A.  It  proceeds  mostly  as  DatanMatrix.sigular- 
ValueDecomposition;  only  in  step  4  instead  of  Datanmatrix.sv4 
the  method  DatanMatrix.svm  is  used  which  is  adapted  to  the  Marquardt 
problem. 


A.  18  Least  Squares  with  Constraints 


One  often  encounters  the  problem  (A.5.3) 

r2  =  (Ax  —  b)2  =  min  (A.  18.1) 


with  the  constraint 


Ex  —  d  .  (A.  18.2) 

Here  A  is  as  before  an  mxn  matrix  and  E  is  an  t  x  n  matrix.  We  will  restrict 
ourselves  to  the  only  case  that  occurs  in  practice, 


Rang  E  =  l  <  n 


(A.  18.3) 


The  determination  of  an  extreme  value  with  constraints  is  usually  treated 
in  analysis  textbooks  with  the  method  of  Lagrange  multipliers.  Here  as  well 
we  rely  on  orthogonal  transformations.  The  following  method  is  due  to  LAW- 
SON  and  Hanson  [18].  It  uses  a  basis  of  the  null  space  of  E.  First  we  carry 
out  an  orthogonal  decomposition  of  E  as  in  (A.5.5), 

E  =  HRKJ  .  (A.  18.4) 


Here  we  regard  the  orthogonal  n  x  n  matrix  K  as  being  constructed  out  of  an 
«xf  matrix  K\  and  an  n  x  (n  —  t)  matrix  K2, 


K  =  (K i,  K2) 


(A.  18.5) 


According  to  (A.5.6)  and  (A.5.7),  all  solutions  of  (A.  18.2)  have  the  form 
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x  =  ^iPj  +  K2V2  =  x  +  K2\)2  •  (A.  18.6) 


Here  x  is  the  unique  solution  of  minimum  absolute  value  of  (A.  18.2).  For 
brevity  we  will  write  this  in  the  form  x  =  £+d;  cf.  (A.  10.3).  p2  is  an  arbitrary 
(i n  —  £)-vector,  since  the  vectors  AAp2  form  the  null  space  of  E, 

EK2y2  =  0  •  (A.  18.7) 

The  constraint  (A.  18.2)  thus  says  that  the  vector  x  for  which  (A.  18.1)  is  a 
minimum  must  come  from  the  set  of  all  vectors  x,  i.e., 

(Ax-b)2  =  (A(x  +  AT2P2)  —  b)2 

=  (A  A/2P2  —  (b  —  Ax))2  =  min  .  (A.  18.8) 

This  relation  is  a  least-squares  problem  without  constraints,  from  which  the 
{n  —  O-vector  p2  can  be  determined.  We  write  its  solution  using  (A.  10.3)  in 
the  form 

p2  =  (A  AT2)+(b  —  Ax)  .  (A.  18.9) 

By  substitution  into  (A.  18.6)  we  finally  obtain 

x  =  x  +  A+(b-Ax)  =  E+d  +  K2(AK2)+(b-AE+d)  (A.18.10) 

as  the  solution  of  (A.  18.1)  with  the  constraint  (A.  18.2). 

The  following  prescription  leads  to  the  solution  (A.18.10).  Its  starting 
point  is  the  fact  that  one  can  set  H  =  I  because  of  (A.  18.3). 

Step  1:  One  determines  an  orthogonal  matrix  K  =  (K\,  A?)  as  in 
(A.  18.5)  such  that 

EK  =  (EKU  EK2)  =  (Eu0) 

and  such  that  E\  is  a  lower  triangular  matrix.  In  addition  one  computes 

AK  =  (AK\,  AK2)  =  (Ai,  A2)  . 


Step  2:  One  determines  the  solution  p,  of 

£iPi=d 

This  is  easy,  since  E\  is  a  lower  triangular  matrix  of  rank  t.  Clearly  one  has 

x  =  Kipl. 

Step  3:  One  determines  the  vector 

b  =  b  —  Aipj  =  b  —  AK\Kj3L  =  b  —  Ax  . 


Step  4:  One  determines  the  solution  p2  to  the  least-squares  problem 
(A.  18.8)  (without  constraints) 
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^  -  2 

(A2P2  —  b)  —  min  . 

Step  5:  From  the  results  of  steps  2  and  4  one  finds  the  solution  to  (A.  18.1) 
with  the  constraint  (A.  18.2) 


x  —  K 


(K  ipA 

v^2p2; 


We  will  now  consider  a  simple  example  that  illustrates  both  the  least- 
squares  problem  with  constraints  as  well  as  the  method  of  solution  given 
above. 


Example  A.5:  Least  squares  with  constraints 
Suppose  the  relation  (A.  18.1)  has  the  simple  form 

2  2 

r  =  x  =  mm 

for  n=  2.  One  then  has  m=n  =  2,  A  =  I,  and  b  =  0.  Suppose  the  constraint  is 

X\  +X2  =  1  , 


i.e.,f  =  l,£  =  (l,l),d  =  1. 

The  problem  has  been  chosen  such  that  it  can  be  solved  by  inspec¬ 
tion  without  mathematical  complications.  The  function  z  =  x2  =  jc2  +  x\ 
corresponds  to  a  paraboloid  in  the  (x\,X2,z)  space,  whose  minimum  is  at 
x\  =  X2  =  0.  We  want,  however,  to  find  not  the  minimum  in  the  entire  (,v'i ,  a?) 
plane,  but  rather  only  on  the  line  x\  +X2  =  1,  as  shown  in  Fig.  A.7.  It  clearly 
lies  at  the  point  where  the  line  has  its  smallest  distance  from  the  origin,  i.e., 
at  x\  =  X2  =  1/2. 


x2  A 


Fig.A.7  :  The  solution  x  to  Example  A.5  lies  on  the  line  given  by  the  constraint  X 1  +  xi  =  1 . 


Of  course  one  obtains  the  same  result  with  the  algorithm.  With 
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we  obtain  E\  —  V2,  pq  —  1/V2, 


Ai  = 


~  “9 

We  solve  the  problem  (A2P2  —  b)  =  min  with  the  normal  equations 


p2  =  aT2A2ylAT2b  =  (V2)-1  0  =  0 


The  full  solution  is  then 


The  method  DatanMatrix.leastSquaresWithConstraints 
solves  the  problem  of  least  squares  (Ax  —  b)2  =  min  with  the  linear  constraint 
Ex  —  d. 
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Java  Classes  for  Vector  and  Matrix  Operations 
DatanVector  contains  methods  for  vector  operations. 
DatanMat  ri  x  contains  methods  for  matrix  operations. 


Example  Program  A.l:  The  class  ElMtx  demonstrates  simple  operations 
of  matrix  and  vector  algebra 
At  first  the  matrices 


1  2  3  \ 

2  13/ 


2  3  1  \ 

1  5  4  J 


5 

4 

3 


and  the  vectors 


are  defined.  Then,  with  the  appropriate  methods,  simple  operations  are  performed 
with  these  quantities.  Finally,  the  resulting  matrices  and  vectors  are  displayed.  The 
operations  are 

R  =  A,R  =  A  +  B,R  =  A-B,R  =  AC  ,S  =  ABT  ,T  =  AJB  ,R  =  I,R  =  0.5A, 


R  =  At 


,  z  =  w 


,  x  =  u  + v  ,  x  =  u  —  v  ,d  =  uv 


,x  =  0.5u  , x  =  0  . 
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Example  Program  A.2:  The  class  E2Mtx  demonstrates  the  handling  of 
submatrices  and  sub  vectors. 

The  matrices 


1  2  3  \ 

2  1  3  ) 


and  the  vectors 


are  defined. 

Next  a  submatrix  S  is  taken  from  A,  and  a  submatrix  of  A  is  overwritten  by  D. 
A  column  vector  and  a  row  vector  are  taken  from  A  and  inserted  into  A.  Finally  some 
elements  are  taken  according  to  a  list  from  the  vector  u  and  assembled  in  a  vector  z, 
and  elements  of  w  are  put  into  positions  (defined  by  list)  of  the  vector  u. 

Example  Program  A.3:  The  class  E3Mtx  demonstrates  the  performance  of 
Givens  transformations 
Fist  the  two  vectors 


are  defined.  Next,  by  the  use  of  DatanMatrix.defineGivensTransfor- 
mation  transformation  parameters  c  and  s  for  the  vector  u  are  computed  and  dis¬ 
played.  The  Givens  transformation  of  u  with  these  parameters  is  performed  with 
DatanMatrix.applyGivensTransformation  yielding 


Finally,  by  calling  DatanMatrix.applyGivensTransforma- 

tion  parameters  c  and  s  are  computed  for  the  vector  w  and  the  transformation  is 
applied  to  this  vector. 

Example  Program  A.4:  The  class  E4Mtx  demonstrates  the  performance  of 
Householder  transformations 
First  the  two  vectors 


are  defined.  Moreover,  the  indices  n,  p  are  set  to  £  n  —  6,  p  =  3,  t  —  5.  By 
calling  DatanMatrix.defineHouseholderTransformation  the  House¬ 
holder  transformation  defined  by  these  indices  and  the  vector  v  is  initialized.  The 
application  of  this  transformation  to  the  vectors  v  and  c  is  performed  by  two  calls  of 
DatanMatrix.defineHouseholderTransformation.  The  results  are  dis¬ 
played  alphanumerically. 
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Example  Program  A.5:  The  class  E5Mtx  demonstrates  the  Gaussian 
algorithm  for  the  solution  of  matrix  equations 

The  matrix 


1  2 
2  1 
1  1 


and  the  3x3  unit  matrix  B  are  defined.  A  call  of  Datanmatrix. matrix 
Equation  solves  the  matrix  equation  AX  =  B ,  i.e.,  X  is  the  inverse  of  A.  The 
matrix  A  is  identical  to  the  one  chosen  in  Example  A.l.  In  that  example  the  algo¬ 
rithm  is  shown  step  by  step. 


Example  Program  A.6:  The  class  E6Mtx  demonstrates  Cholesky 
decomposition  and  Cholesky  inversion 

First,  the  matrices 


/  t  7  13  \  /  t  t  \ 

B=  3  9  17  ,  C=  1  1 

V  5  11  19  /  \  1  1  / 

are  defined  and  the  symmetric,  positive-definite  matrix  A  =  BTB  is  constructed.  By 
calling  DatanMatrix.choleskyDecompostion  the  Cholesky  decomposi¬ 
tion  A  =  UTU  is  performed  and  the  triangular  matrix  U  is  displayed.  Multiplication 
of  UT  by  U  yields  in  fact  UTU  =  S.  The  method  DatanMatrix. cholesky  - 
Multiply  is  then  used  to  compute  R  =  UC.  Finally,  by  Cholesky  inversion  with 
DatanMatrix.cholesky Inversion  ,  the  inverse  S  1  of  S  is  computed.  Mul¬ 
tiplication  with  the  original  matrix  yields  SS~l  =  I . 


Example  Program  A.7:  The  class  E7Mtx  demonstrates  the  singular  value 
decomposition 

The  program  first  operates  on  the  same  matrix  as  E5Mtx.  However,  the  matrix  in¬ 
version  is  now  performed  with  DatanMatrix.pseudoInverse 
Next,  the  matrix  is  replaced  by 


This  matrix,  having  two  identical  columns,  is  singular.  With  another  call  of  Datan 
Matrix. pseudo  Inversenot  only  the  psudoinverse  matrix  but  also  the  residuals 
the  diagonal  matrix  D,  and  the  two  orthogonal  matrices  U  and  V  are  determined. 
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Example  Program  A.8:  The  class  E8Mtx  demonstrates  the  solution  of 

matrix  equations  by  singular  value  decomposition  for  9  different  cases 

In  the  framework  of  our  programs,  in  particular  that  of  least-squares  and  minimization 
problems,  matrix  equations  are  solved  nearly  exclusively  by  singular  value  decompo¬ 
sition.  In  Sect.  A.5  we  have  listed  the  different  cases  of  the  matrix  equation  Ax  ~  b. 
If  A  is  an  m  x  n  matrix  then  first  we  must  distinguish  between  the  cases  m  =  n  (case 
I),  m  >  n  (case  2),  and  m  <  n  (case  3).  A  further  subdivision  is  brought  about  by 
the  rank  of  A.  If  the  rank  k  of  A  is  k  =  min (m,  n )  then  we  have  the  case  la  (or  2a  or 
3a).  If,  however,  k  <  min (m,  n)  then  we  are  dealing  with  case  lb  (or  2b  or  3b).  The 
rank  of  a  matrix  is  equal  to  its  number  of  non- vanishing  singular  values.  In  numerical 
calculations,  which  are  always  performed  with  finite  accuracy,  one  obviously  has  to 
define  more  precisely  the  meaning  of  “non- vanishing”.  For  this  definition  we  use  the 
method  of  singular  value  analysis  (Sect.  A.  13)  and  set  a  singular  value  equal  to  zero 
if  it  is  smaller  than  a  fraction  /  of  the  largest  singular  value.  The  number  of  finite 
singular  values  remaining  in  this  analysis  is  called  the  pseudorank.  In  addition  to  the 
cases  mentioned  above  we  consider  as  well  the  cases  lc,  2c,  and  3c,  in  which  the 
matrix  A  has  full  rank  but  not  full  pseudorank. 

The  program  consists  of  two  nested  loops.  The  outer  loop  mns  through  the 
cases  1,  2,  3,  the  inner  loop  through  the  subcases  a,  b,  c.  For  each  case  the  matrix  A 
is  composed  of  individual  vectors.  In  the  subcase  b  two  of  these  vectors  are  chosen 
to  be  identical.  In  the  subcase  c  they  are  identical  except  for  one  element,  which 
in  one  vector  differs  by  s  =  10“ 12  compared  to  the  value  in  the  other  vector.  In 
case  3  the  system  of  linear  equations  symbolized  by  the  matrix  equation  Ax  =  b  has 
less  equations  (m)  than  unknowns  ( n ).  This  case  does  not  appear  in  practice  and, 
therefore,  is  not  included  in  the  programs.  It  is  simulated  here  in  the  following  way. 
In  case  3  with  m  =  2  and  n  =  3,  the  matrix  A  is  extended  to  become  a  3  x  3  matrix  by 
addition  of  another  row  the  elements,  of  which  are  all  set  to  zero,  and  correspondingly 
m  is  set  to  3. 

If  the  singular  value  analysis  shows  that  one  or  several  singular  values  are 
smaller  than  the  fraction  /  of  the  largest  singular  value,  then  they  are  set  to  zero. 
In  our  example  program  for  each  of  the  9  cases  the  analysis  is  performed  twice,  first 
for  /  =  10-15  and  then  for  /  =  10-10.  For  /  =  10-15  in  our  example  cases  lc,  2c, 
3c  the  matrix  A  has  full  rank,  in  spite  of  the  small  value  of  s  =  10~12.  the  singular 
value  analysis  with  /  =  10-10  reduces  the  number  of  singular  values.  Note  that  the 
elements  of  the  solution  matrix  differ  as  the  choice  of  /  changes  for  cases  lc,  2c,  3c. 
The  unwieldly  numerical  values  in  the  case  of  /  =  10-15  show  that  we  are  near  the 
limits  of  numerical  stability. 

Example  Program  A.9:  The  class  E9Mtx  demonstrates  the  use  of  the 
method  DatanMatrix.marquardt 

The  method  DatanMatrix.marquardt  will  be  rarely  called  directly.  It  was 
written  to  be  used  in  LsqMar  and  MinMar.  For  completeness  we  demonstrate  it 
with  a  short  class.  It  solves  the  problem  Ax  ~  b  -  modified  according  to  (A.  17.3)  - 
for  given  A,  b,  and  X  and  displays  the  results  xi  and  x2. 
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Example  Program  A.10:  The  class  ElOMtx  demonstrates  solution  of  the 
least  squares  problem  with  constraints  by  the  method 
DatanGraphics  leastSquaresWithConstraints 

The  problem  of  Examples  9.9  and  9.10  is  solved,  i.e.,  the  measurement  x\  —  89,  v 2  = 
31,  V3  =  62  of  the  three  angles  of  a  triangle  and  the  evaluation  of  these  measurements 
using  the  constraint  x\  +X2  +  X3  =  180.  The  evaluation  requires  the  solution  of  (Ax  — 
b)2  =  min  with  the  constraint  Ex  =  d  with 

/  1  0  0  \  /  89  \ 

A=  0  1  0  ,  b=[  31  )  ,  £  =  (1,1,1)  ,  d  =  180  . 

\  0  0  1  /  \  62  / 

In  the  program  the  matrices  and  vectors  A,  b,  E ,  d  are  provided.  The  solution  is 

computed  by  calling  DatanGraphics  leastSquaresWithConstraints 

It  is,  of  course,  identical  to  the  results  found  in  the  previously  mentioned  examples. 
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Consider  n  distinguishable  objects  a\,  ci2,  ■■■,  an.  We  ask  for  the  number 
of  possible  ways  Pk,  in  which  one  can  place  k  of  them  in  a  given  order. 
Such  orderings  are  called  permutations.  For  the  example  n  =  4,  k  =  2  these 
permutations  are 


a\ci2  , 

a\as  , 

a\ci/\ 

Cl2Cl\  , 

a2«3  , 

<22^4 

dya\  , 

a^ai  , 

ay  CL4 

Cl/[Cl\  , 

a4«2  , 

<24*23 

i.e.,  P£  —  12.  The  answer  for  the  general  problem  can  be  derived  from  the 
following  scheme.  There  are  n  different  possible  ways  to  occupy  the  first 
place  in  a  sequence.  When  one  of  these  ways  has  been  chosen,  however,  there 
are  only  n  —  1  objects  left,  i.e.,  there  remain  n  —  1  ways  to  occupy  the  second 
place,  and  so  forth.  One  therefore  has 


P£  =n(n-l)(n  -2)  ■■■(n- 

k  + 1) 

(B.l) 

The  result  can  also  be  written  in  the  form 

pn_  n ! 

k  ( n-k)\  ’ 

where 

n\  =  1  •  2-  •  n  ;  0!  =  1  , 

(B.2) 

1!  =  1  . 

(B.3) 

Often  one  is  not  interested  in  the  order  of  the  k  objects  within  a  permutation 
(the  same  k  objects  can  be  arranged  in  kl  different  ways  within  the  sequence), 
but  rather  one  only  considers  the  number  of  different  ways  of  choosing  of  k 
objects  out  of  a  total  of  n.  Such  a  choice  is  called  a  combination.  The  number 
of  possible  combinations  of  k  elements  out  of  n  is  then 

n\ 


_ 

C k  ~ 


pn 

rk 


k\  k\(n  —  k)\ 


(B.4) 
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Fig.B.l  :  Pascal’s  triangle. 


For  the  binomial  coefficients  (?)  one  has  the  simple  recursion  relation 


(B.5) 


which  can  easily  be  proven  by  computation: 

(n-l)\  (n  —  1)! 

k\(n  —  k  —  1)!  '  (k—  \)\{n  —  k)\ 

(n  —  k)(n  —  1)!  +k(n  —  1)! 

k\(n  —  k)\ 
n\ 

k\(n  —  k)\ 

The  recursion  formula  is  the  basis  for  the  famous  Pascal’s  triangle, 
which,  because  of  its  beauty,  is  shown  in  Fig.  B.l.  The  name  “binomial  coef¬ 
ficient”  comes  from  the  well-known  binomial  theorem. 


{a  +  b)n  =  f^(n\akbn-k  ,  (B.6) 

k=0 

the  proof  of  which  (by  induction)  is  left  to  the  reader.  We  use  the  theorem  in 
order  to  derive  a  very  important  property  of  the  coefficient  (?) .  For  this  we 
write  it  in  the  simple  form  for  b  —  1,  i.e., 


(a  +  If 


k= 0 


a 


k 
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and  we  then  apply  it  a  second  time, 


(a+\)n+m  =  (a  +  l)n(a  +  l)m 


If  we  consider  only  the  term  with  a  ,  then  by  comparing  coefficients  we  find 


C.  Formulas  and  Methods  for  the  Computation 
of  Statistical  Functions 


C.l  Binomial  Distribution 


We  present  here  two  function  subprograms  for  computing  the  binomial  distri¬ 
bution  (5.1.3) 


(l)pko-prt 


(C.1.1) 


and  the  distribution  function 


K- 1 

P(k  <  K)  =  K 

k= 0 


(C.1.2) 


are  computed  by  the  methods  StatFunct.binomial  and  StatFunct.- 
cumulativeBinomial,  respectively.  For  reasons  of  numerical  stability 
the  logarithm  of  Euler’s  gamma  function  is  used  in  the  computation. 


C.2  Hypergeometric  Distribution 


The  hypergeometric  distribution  (5.3.1) 


Wk  = 


K\  ( N-K 


N' 

n 


n  —  k 

and  the  corresponding  distribution  function 

k'- 1 

Pik  <k')  =  j2  wk 


n  <  N,k  <  K 


k=0 


are  computed  by  StatFunct.hypergeometric  and  StatFunct.- 
cumulativeHypergeometric,  respectively. 


(C.2.1) 


(C.2. 2) 
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C.3  Poisson  Distribution 

The  Poisson  distribution  (5.4.1) 

k 

f  (k;  A)  —  — e_A  (C.3. 1) 

Ac! 

and  the  corresponding  distribution  function 

K- 1 

P(k  <  K)=J^f(k;X)  (C.3.2) 

k=  0 

are  computed  with  the  methods  StatFunct. poisson  and  StatFunct.- 
cumulative.Poisson  respectively. 

The  quantities  f(k;  A)  and  F{  K ;  A)  depend  not  only  on  the  values  of  the 
discrete  variables  k  and  K,  but  also  on  the  continuous  parameter  A.  For  a 
given  P  there  is  a  certain  parameter  value  A p  that  fulfills  Eq.  (C.3.2).  We  can 
denote  this  as  the  quantile 

A  =  A  P(K)  (C.3. 3) 

of  the  Poisson  distribution.  It  is  computed  by  the  method  StatFunct.- 
quantilePoisson. 


C.4  Normal  Distribution 


The  probability  density  of  the  standard  normal  distribution  is 


00  (*) 


I 


V2 


exp(— jc2/2) 


TT 


(C.4.1) 


It  is  computed  by  the  methodStatFunc.standardNormal. 

The  normal  distribution  with  mean  xq  and  variance  cr2. 


1  /  (x  —  xo)2 

0  (x)  =  — _  exp 


V2 


71  <7 


2a2 


(C.4.2) 


can  easily  be  expressed  in  terms  of  the  standardized  variable 


u  — 


x  —  XQ 


a 


(C.4.3) 


using  (C.4.1)  to  be 


1 

0  (x)  =  — 0o(w) 
a 


(C.4.4) 
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It  is  computed  by  the  method  StatFunct.normal. 

The  distribution  function  of  the  standard  normal  distribution 


fo(x)  = 


f 

J  —  OO 


(po(x)dx  =  — L=  f 

y/7jt  J- 


x 


X 


exp 


—  OO 


2 


dx 


(C.4.5) 


is  an  integral  that  cannot  be  computed  in  closed  form.  We  can  relate  it,  how¬ 
ever,  to  the  incomplete  gamma  function  described  in  Sect.  D.5. 

The  distribution  function  of  a  normal  distribution  with  mean  xo  and  vari¬ 
ance  a2  is  found  from  (C.4.5)  to  be 


t(x)  =  V'b(w)  ,  u  = -  .  (C.4.6) 

o 

We  now  introduce  the  error  function 

2  fx  2 

erf(x)  =  — =  /  e  f  dt  ,  x  >  0  .  (C.4.7) 

V71  Jo 

Comparing  with  the  definition  of  the  incomplete  gamma  function  (D.5.1) 
gives 


erf(x)  = 


!/2d  U 


erf(x)  =  pQ,x2j  .  (C.4.8) 

On  the  other  hand  there  is  a  more  direct  connection  between  (C.4.6)  and 
(C.4.7), 


^0=Ol(r  =  l/V2(X)  =  ^o(w  =  V2x) 


i[l+sign(x)erf(|x|)] 


or 


i.e., 


1 

fo(u)  -  - 


1  +  sign(«)erf 


1 

to  (u)  =  - 


1  +  sign(n)P 


(C.4.9) 


The  methods  StatFunct.cumulativeStandardNormal  and  Stat- 
Funct.cumulativeNormal  yield  the  distribution  functions  (C.4.4)  and 
(C.4.5),  respectively. 

Finally  we  compute  the  quantiles  of  the  standard  normal  distribution  (us¬ 
ing  the  method  StatFunct.quantileStandardnormal).  For  a  given 
probability  P,  the  quantile  xp  is  defined  by  the  relation 
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1  fXp  2 

P  =  fo(xp)  =  — =  /  e~x /2dx  .  (C.4.10) 

V27T  J-o o 

We  determine  this  by  finding  the  zero  of  the  function 

k{x,P)  =  P-f 0(x)  (C.4.11) 

using  the  procedure  of  Sect.  E.2. 

For  the  quantile  xp  for  a  probability  P  of  the  normal  distribution  with 
mean  xq  and  standard  deviation  o  (computed  with  StatFunct.quan- 
tileNormal)  one  has 

P  —  \Itq{up )  ,  xp—xo  +  aup  .  (C.4.12) 

C.5  x  2 "Distribution 

The  probability  density  (6.6.10)  of  the  x  2-distribution  for  n  degrees  of  free¬ 
dom, 


fix  )  - 


1 


2  Xr(x) 


(xY-1e-^2 


n 

k  =  -  , 

2 


(C.5.1) 


is  computed  by  the  method  StatFunct.chiSquared. 
The  distribution  function 


11  x  1  /U\X- 1  l 

=  ™L  2(2)  ^"d" 


hi 

mL 


t=x2/ 2 


e  ftx  !df 


is  seen  from  (D.5.1)  to  be  an  incomplete  gamma  function 


F(x2)  =  p[x,^r\  =  p("xl 


2 


2  2 


(C.5. 2) 


(C.5. 3) 


and  computed  by  StatFunct.cumulativeChiSquared. 

The  quantile  Xp  of  the  y  2 -distribution  for  a  given  probability  P,  which 
is  given  by 

h(x2p)  =  P-F(x2p)  =  0  ,  (C.5.4) 

is  computed  as  the  zero  of  the  function  h(x  )  with  StatFunct.quan- 
tileChiSquared 


C.7  t -Distribution 
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C.6  F -Distribution 


The  probability  density  (8.2.3)  of  the  F-distribution  with  fa  and  fa  degrees 
of  freedom, 


f(F)  = 


fa 

fa 


\h 


r(i(/i  +  /2))  F\fx-\  ( x 

r(±/i)r(±/2)  V 


+  —F 

fa 


-kfi+fi) 


,  (C.6.1) 


is  computed  with  StatFunct.fDistribution. 
The  distribution  function 


F(F)  = 


r(t(/,  +  /2))  fF  ^ f,-U,  .  fi 

rdforUfi)  V/2 


l 


F2F-1  l  +  FLF 

fa 


-5(/l+/2) 


dF 


(C.6. 2) 


can  be  rearranged  using 


t  — 


fa 


fa  +  fa  F 


m  = 


fa  fa 


(fa  +  faF)2 


|dF| 


to  be 


F(F)  = 


— _ f 

\fa,  \fa)  Jt 


t= 1 


h 

h+f\F 


5(^i.  2 

1  “  fai/ih+fiF)  ikfa’kfa) 


(1  -t)2F-xM~xdt  (C.6. 3) 


i.e.,  it  is  related  to  the  incomplete  beta  function;  cf.  (D.6.1).  We  compute  it 
with  the  method  StatFunct.cumulativeFDistribution. 

The  quantile  Fp  of  the  F-distribution  for  a  given  probability  P  is  given 
by  the  zero  of  the  function 


h(F)  =  F-F(F)  .  (C.6.4) 

It  is  computed  by  StatFunct.quantileFDistribution. 


C.7  t -Distribution 


The  probability  density  (8.3.7)  of  Student’s  /-distribution  with  /  degrees  of 
freedom, 


F(\{f+ 1))  /  /2\“5(/+D 

F(\f)r(\)faJ\  +  f) 

1  /  a4(/+1) 

5(i,|)V7V1+/J 


(C.7.1) 
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is  computed  with  by  StatFunct.  Student. 
By  using  the  substitution 


u  — 


f 


f  + 1 


the  distribution  function  of  the  t -distribution  can  be  expressed  in  terms  of  the 
incomplete  beta  function  (D.6.1), 


Fit)  = 


1 


t 


2\  — j(/+l) 


B(l  4)v7^-< 


1  +  / 


1 

-  + 


sign(f) 


2  ■  *(i,4)V7 


1 

-  + 


2’  2 
sign(f) 


2  Bi±,£)y/j2Ju=f/(f+ti) 


L 
ll 


t 


d  t 


2\  — j(/+l) 


1+/ 


W  =  1 


/_!  1 

n  2  (1  —  m)2  dn 


^(0  =  Z 


1 

2 


1  +  sign(f ) 


1-7 


/  1 

2’  2 


(C.7.2) 


(C.7.3) 


It  is  computed  by  the  method  StatFunct.cumulativeStudent. 

The  quantile  tp  of  the  t -distribution  for  a  given  probability  P  is  computed 
by  finding  the  zero  of  the  function 


hit )  =  P  —  Fit) 


(C.7.4) 


with  the  method  StatFunct.quantileStudent. 


C.8  Java  Class  and  Example  Program 

Java  Class  for  the  Computation  of  Statistical  Functions 

StatFunct  contains  all  methods  mentioned  in  this  Appendix. 

Example  Program  C.l:  The  class  FunctionsDemo  demonstrates  all 
methods  mentioned  in  this  Appendix 

The  user  first  selects  a  family  of  functions  and  then  a  function  from  that  family. 
Next  the  parameters,  needed  in  the  chosen  case,  are  entered.  After  the  Go  button  is 
clicked,  the  function  value  is  computed  and  displayed. 


D.  The  Gamma  Function  and  Related 
Functions:  Methods  and  Programs 
for  Their  Computation 


D.l  The  Euler  Gamma  Function 


Consider  a  real  number  x  with  x  + 1  >0.  We  define  the  Euler  gamma  function 
by 


r(x  + 1) 


(D.  1.1) 


Integrating  by  parts  gives 

p  OO  pOQ  pOQ 

/  txe~tdt  =  [-txe~t]^+x  tx~le~tdt=x  t^o^dt 

Jo  Jo  Jo 

Thus  one  has  the  relation 


r(x  +  i)  =  xr(x)  .  (D.l. 2) 

This  is  the  so-called  recurrence  relation  of  the  gamma  function.  From  (D.  1.1) 
it  follows  immediately  that 

r(l)  =  l  . 

With  (D.1.2)  one  then  has  generally  that 

r(n  +  l)  —  n\  ,  n  —  1,2,...  .  (D.l. 3) 

We  now  substitute  t  by  lu2  (and  dt  by  u  du)  and  get 

r°°  i  i 

E(x  +  l)  =  (^)x  /  u2x+le~ du 

Jo 
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If  we  now  choose  in  particular  x  =  —l,  we  obtain 

1  r  r  i  .,2  1  r°°  1„2 

F(i)  =  V 2/  e~2u  du  =  —  e~2u  d u  .  (D.1.4) 

Jo  V2  J-oo 

The  integral  can  be  evaluated  in  the  following  way.  We  consider 

/°°  r°°  i  ?  9  r°°  i  9 

/  e—2(x+y)dxd y=  e~2x  dx 

-oo  J — oo  J —co 

The  integral  A  can  transformed  into  polar  coordinates: 

p2jT  /»00  j  /»27T 

A  =  I  I  e~2rrdrd(p=  I  d(p 

Jo  Jo  Jo 

Setting  the  two  expressions  for  A  equal  gives 

r(i)  =  V7T  .  (D.1.5) 

Using  (D.1.2)  we  can  thus  determine  the  value  of  the  gamma  function  for 
half-integral  arguments. 

For  arguments  that  are  not  positive  integers  or  half-integers,  the  integral 
(D.1.1)  cannot  be  evaluated  in  closed  form.  In  such  cases  one  must  rely  on 
approximations.  We  discuss  here  the  approximation  of  LANCZOS  [17],  which 
is  based  on  the  analytic  properties  of  the  gamma  function.  We  first  extend 
the  definition  of  the  gamma  function  to  negative  arguments  by  means  of  the 
reflection  formula 


POO 

JO 


e  2r  r dr  —  2tz r (l)  —  2tz 


/°°  1  9 

e-2^  dy  =  2{r(i)}2 

-oo 


r(i-x)  = 


TC 

f(x)  sin(7rv) 


TtX 

r(l  +  x)  sin(7rjc) 


(D.1.6) 


(By  relations  (D.1.1)  and  (D.1.6)  the  gamma  function  is  also  defined  for  an 
arbitrary  complex  argument  if  v  is  complex.)  One  sees  immediately  that  the 
gamma  function  has  poles  at  zero  and  at  all  negative  integer  values.  The 
approximation  of  LANCZOS  [17], 


r(x  +  l)  —  V27T  +  y  4- 


x+j 


exp  |  —x  —  y  —  ^  J  (Ay(x)  +  e)  , 


(D.1.7) 

takes  into  account  the  first  few  of  these  poles  by  the  form  of  the  function  Ar, 


ci 

Ay(x)  =  CO  -I  — 

x  + 1 


cy+ 1 


x  +  y  +  1 


(D.1.8) 


D.l  The  Euler  Gamma  Function 
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For  y  =  5  one  has  for  the  error  s  the  approximation  |  s  \  <  2-10  10  for  all 
points  jc  in  the  right  half  of  the  complex  plane.  The  method  Gamma.gamma 
yields  Euler’s  gamma  function. 


A 


>  x 


(x) 


A 


>  x 


Fig. D.l:  The  functions  f(x)  and  1  /  T(x). 


The  gamma  function  is  plotted  in  Fig.  D.l.  For  large  positive  arguments 
the  gamma  function  grows  so  quickly  that  it  is  difficult  to  represent  its  value 
in  a  computer.  In  many  expressions,  however,  there  appear  ratios  of  gamma 
functions  which  have  values  in  a  an  unproblematic  region.  In  such  cases  it  is 
better  to  use  the  logarithm  of  the  gamma  function  which  is  computed  by  the 
method  Gamma.logGamma. 
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D.2  Factorial  and  Binomial  Coefficients 


The  expression 


n\  =  1  •  2-  •  -n 


(D.2.1) 


can  either  be  directly  computed  as  a  product  or  as  a  gamma  function  using 
(D.1.3). 

When  computing  binomial  coefficients, 


n 


nl 


n  n  —  1  n  —  k+1 


kj  k\(n  —  k)\  k  k—  1 


1 


(D.2. 2) 


the  expression  on  the  right-hand  side  is  preferable  for  numerical  reasons  to 
the  expression  in  the  middle  and  used  in  the  method  Gamma.Binomial. 


D.3  Beta  Function 


The  beta  function  has  two  arguments  and  is  defined  by 


B(z,  w )  = 


(l-o 


w  —  1 


d  t 


(D.3.1) 


o 


The  integral  can  be  written  in  a  simple  way  in  terms  of  gamma  functions, 


B(z,  w)  =  B(w,z)  = 


r(z)r(w) 
r(z  +  w ) 


(D.3. 2) 


In  this  way  the  method  Gamma. beta  computes  the  beta  function  Figure  D.2 
shows  it  as  a  function  of  w  for  several  fixed  values  of  z. 


D.4  Computing  Continued  Fractions 


In  the  next  two  sections  there  appear  continued  fractions,  i.e.,  expressions  of 
the  type 

f  =  b  0  + - -  ,  (D.4.1) 


b  i  + 


02 


b2  + 


a3 


b3+- 


that  can  also  be  written  in  the  typographically  simpler  form 


f  =  bo  + 


CL  l  (22  (22 


•  •  • 


b  i+  b2+  b3+ 


(D.4.2) 


If  we  denote  by  fn  the  value  of  the  fraction  (D.4.1)  truncated  after  a  finite 
number  of  terms  up  to  the  coefficients  an  and  bn ,  then  one  has 


D.4  Computing  Continued  Fractions 
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Fig.D.2  :  The  beta  function.  For  increasing  z  the  curves  B(z,w )  are  shifted  to  the  left. 


fn  =  ^r  ■  (D.4.3) 

The  quantities  An  and  Bn  can  be  obtained  from  the  following  recursion  rela¬ 
tion, 

A_  i  =  l  ,  5-i=0  ,  A0  =  b0  ,  Bo  =  1  ,  (D.4.4) 

A  j  =  b j  A  j _i  -F  <Xj  A  j —2  ,  (D.4. 5) 

Bj— bjBj-i+ajBj-2  ■  (D.4.6) 

Since  the  relations  (D.4.5)  and  (D.4.6)  are  linear  in  Aj-\,  Aj-2  and  Bj _ i , 
Bj- 2,  respectively,  and  since  in  (D.4.3)  only  the  ratio  An/Bn  appears,  one 
can  always  multiply  the  coefficients  Aj,  Aj-\,  Aj-2  and  Bj,  Bj- 1,  Bj- 2  by 
an  arbitrary  normalization  factor.  One  usually  chooses  for  this  factor  1  /  Bj 
and  avoids  in  this  way  numerical  difficulties  from  very  large  or  very  small 
numbers,  which  would  otherwise  occur  in  the  course  of  the  recursion.  For 
steps  in  which  Bj  —  0,  the  normalization  is  not  done. 

Continued  fractions,  in  a  way  similar  to  series  expansions,  appear  in  ap¬ 
proximations  of  certain  functions.  In  a  region  where  the  approximation  for 
the  continued  fraction  converges,  the  values  of  fn-\  and  fn  for  sufficiently 


420 


D  The  Gamma  Function  and  Related  Functions 


large  n  do  not  differ  much.  One  can  therefore  use  the  following  truncation 
criterion.  If  for  a  given  s  <<C  1  the  inequality 


<  e 


holds,  then  fn  is  a  sufficiently  good  approximation  of  /. 


D.5  Incomplete  Gamma  Function 


The  incomplete  gamma  function  is  defined  for  a  >  0  by  the  expression 


1  Cx 

P{a,x)  — -  /  t~Jta~l  dt  .  (D.5.1) 

r(a)  Jo 

It  can  be  expressed  as  a  series  expansion 

CO  ft  j  CO  p  /  \ 

P{a,x)=xat~x  Y" - - - = - xae"*V - — - xn  . 

^-'r(a  +  n  + 1)  r{a)  ^-'r(a  +  n  + 1) 

n= 0  n— 0 

(D.5.2) 

The  sum  converges  quickly  for  x  <  a  +  1 .  One  uses  the  right-hand,  not  the 
middle  form  of  (D.5.2),  since  the  ratio  of  the  two  gamma  functions  reduces  to 


r(a) 


1  1 


1 


T  (a  +  n  +  X)  a  a  \  a  n  \ 
In  the  region  x  >  a  +  1  we  use  the  continued  fraction 


1 

1  —  P(a,x)  — - e  Xxa 

r(a) 


1  1 —a  1  2— a  2 


x~\~  l-!-  xT  IT  xT 


(D.5. 3) 


The  method  Gamma  Incompletegamma  yields  the  incomplete  gamma 
function.  It  is  shown  in  Fig.  D.3  for  several  values  of  a.  From  the  figure  one 
sees  immediately  that 


P(a,  0)  -  0 
lim  P(a,x )  =  1 

X^OO 


(D.5.4) 

(D.5.5) 


D.6  Incomplete  Beta  Function 

The  incomplete  beta  function  is  defined  for  a  >  0,  b  >  0  by  the  relation 


D.6  Incomplete  Beta  Function 


421 


Fig.D.3  :  The  incomplete  gamma  function.  With  increasing  a  the  graphs  P(a,x )  move  to  the 
right. 


Ix(a,b)  = 


— - —  fV1 

B(a,b)  J0 


(1  —  t 


x  <  0  <  1 


(D.6.1) 


The  function  obeys  the  symmetry  relation 


Ix(a,b)  =  l  -  h-x(b,a)  . 


(D.6. 2) 


The  expression  (D.6. 1)  can  be  approximated  by  the  following  continued  frac¬ 
tion: 


h(a,b) 


xa(l  —x)b  |  1  d\  d2 
aB(a,  b)  IT-  IT  1 


(D.6. 3) 


with 


(a  +  m)(a  +  b  +  m) 

6 a  +  2  m)(a  +  2m  —  1) 

m(b  —  m ) 

(a  +  2  m  —  l)(a  +  2m) 


The  approximation  converges  quickly  for 


a  T  1 

x  > 


ci  T  b  T  1 


(D.6.4) 
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If  this  requirement  is  not  fulfilled,  then  1  —  x  is  greater  than  the  right-hand 
side  of  (D.6.4).  In  this  case  one  computes  I\~x  as  a  continued  fraction  and 
then  uses  (D.6.2). 

The  method  Gamma.  incompleteBezta  computes  the  incomplete  beta 
function.  In  Fig.  D.4  it  is  s  displayed  for  various  values  of  the  parameters  a 
and  b.  Regardless  of  these  parameters  one  has 

Io(a,b)=0  ,  I\(a,b)  =  \  .  (D.6.5) 


D.7  Java  Class  and  Example  Program 

Java  Class  for  for  the  Computation  of  the  Gamma  Function  and  Related 
Functions 

Gamma  contains  all  methods  mentioned  in  this  Appendix. 


b  =  0.50,  a  =  0.50,  1.00,...,  2.50  b  =  1.00,  a  =  0.50,  1.00,...,  2.50 


1 

Ix(a,b)  0.8 
A  0.6 
0.4 
0.2 
0 


0  0.2  0.4 


0.6  0.8 
H>  X 


1 

Ix(a,b)  0.8 
A  0.6 
0.4 
0.2 
0 
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0.2  0.4  0.6  0.8  1 

— >  x 


b  =  1.50,  a  =  0.50,  1.00,...,  2.50  b  =  2.00,  a  =  0.50,  1.00,...,  2.50 


1 

I XC  a ,  b )  0.8 
A  0.6 
0.4 
0.2 
0 


0 


0.2  0.4  0.6  0.8  1 

— >  x 


1 

I XC  a ,  b )  0.8 
A  0.6 
0.4 
0.2 
0 


0 


0.2  0.4  0.6  0.8 
— >  x 


Fig.  D.4  :  The  incomplete  beta  function.  With  increasing  a  the  curves  lx(a,b )  move  further 
to  the  right. 


Example  Program  D.l:  The  class  FunctionsDemo  demonstrates  not 

only  the  methods  of  Appendix  C  but  also  all  methods  mentioned  in  the 
present  Appendix 


D.7  Java  Class  and  Example  Program 
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The  user  first  selects  a  family  of  functions  and  the  a  function  from  that  family. 
Next  the  parameters,  needed  in  the  chosen  case,  are  entered.  After  the  Go  button 
is  clicked,  the  function  value  is  computed  and  displayed. 


E.  Utility  Programs 


E.l  Numerical  Differentiation 


The  derivative  d  f(x) / dx  of  a  function  f(x)  at  the  point  x  is  given  by  the  limit 

f(x  +  h)-f(x) 


f  (x)  =  lim 
h^O 


h 


Obviously  one  can  approximate  f'(x)  by 

f(x+h)-f(x) 


h 

for  a  small  finite  value  of  h.  In  fact,  for  this  the  symmetrical  difference  ratio 


8(h)  = 


f(x  +  h)-f(x-h) 
2  h 


(E.  1.1) 


is  more  appropriate.  This  can  be  seen  from  the  Taylor  expansions  at  the  point 
x  for  /(x  +  h )  and  f(x  —  h ),  which  give 

m  =  f\x) + p"(x) + ^ /(5)(x ) + •  •  •  , 


in  which  the  leading  additional  term  is  already  quadratic  in  h.  Nevertheless, 
the  choice  of  h  is  still  critical,  since  for  very  small  values  of  h  there  occur 
large  rounding  errors,  and  for  larger  values  the  approximation  may  not  be 
valid. 

One  can  compute  8  ( h )  for  a  monotonic  sequence  of  values  h  =  ho,hi,h2, 

_ If  the  sequence  8 (ho),  8 (hi),  ...  is  monotonic  (rising  or  falling)  then  this 

is  a  sign  of  convergence  of  the  series  to  f(x).  According  to  RUTISHAUSER 
[27],  from  the  series  8 (ho),  8 (hi)  one  can  also  obtain  others  that  converge 
more  quickly.  The  method  was  modeled  after  the  Romberg  procedure  [28] 
for  numerical  integration.  Starting  from  ho  =  a  one  first  chooses  the  sequence 
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ho,  h i,  ...  =  a,  3a/4,  a/2,  3a/8,  a/4, 


sets 


T^  =  8(hk) 


and  computes  the  additional  quantities  T,„ 

m  2m- 1.125 

T<t)  =  2"'.1.125r,<t+1n-U2, 

m  2m  •  1.125  —  1 

<*+i)  _  T  r/,t 
r(£)  _  z  Jm-1  Jm-lW 

m  2m _ 1  ’ 


(*) 


m  odd,  k  even 
m  odd,  k  odd 

m  even  . 


Arranging  the  quantities  T,,/ 


( K > 


in  the  form  of  a  triangle, 


rr  (0) 

7  3 


the  first  column  contains  the  sequence  of  our  original  difference  ratios.  Not 
only  does  Tq  converge  to  fix),  but  one  has  in  general 

lim  T<P  =  f'(x)  ,  lim  =  f\x)  . 

k^oo  ni^oo 

The  practical  significance  of  the  procedure  is  based  on  the  fact  that  the 
columns  on  the  right  converge  particularly  quickly. 

In  the  class  AuxDer,  the  sequence  Tq,  ...,  is  computed  starting 
from  a  =  1.  If  it  is  not  monotonic,  then  a  is  replaced  by  a/ 10  and  a  new 
sequence  is  computed.  After  10  tries  without  success  the  procedure  is  ter¬ 
minated.  If,  however,  a  monotonic  sequence  is  found,  the  triangle  scheme  is 

computed  and  Tg(>)  is  given  as  the  best  approximation  for  /' (x ) .  The  class 
AuxDer  is  similar  to  the  program  of  Koelbig  [29]  with  the  exception  of 
minor  changes  in  the  termination  criteria. 

This  program  requires  considerable  computing  time.  For  well  behaved 
functions  it  is  often  sufficient  to  replace  the  differential  ratio  by  the  difference 
ratio  (E.1.1).  To  compute  second  derivatives  the  procedure  of  difference  ra¬ 
tios  is  extended  correspondingly.  The  classes  AuxDri,  AuxGrad  and  Aux- 
Hesse  therefore  operate  on  the  basis  of  difference  ratios. 
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E.2  Numerical  Determination  of  Zeros 

Computing  the  quantile  xp  of  a  distribution  function  F  (x )  for  a  given  proba¬ 
bility  P  is  equivalent  to  determining  the  zero  of  the  function 

k(x)  =  P  —  F(x )  .  (E.2.1) 

We  treat  the  problem  in  two  steps.  In  the  first  step  we  determine  an  interval 
(xo,xi)  that  contains  the  zero.  In  the  second  step  we  systematically  reduce 
the  interval  such  that  its  size  becomes  smaller  than  a  given  value  s. 

In  the  first  step  we  make  use  of  the  fact  that  k(x)  is  monotonic,  since  fix) 
is  monotonic.  We  begin  with  initial  values  for  xq  and  x\.  If  f(x0)-f(xi)  <  0, 
i.e.,  if  the  function  values  have  different  signs,  then  the  zero  is  contained 
within  the  interval.  If  this  is  not  the  case,  then  we  enlarge  the  interval  in  the 
direction  where  the  function  has  the  smallest  absolute  value,  and  repeat  the 
procedure  with  the  new  values  of  (xq,  x\). 

For  localizing  the  zero  within  the  initial  interval  (xo,  xi)  we  use  a  compar¬ 
atively  slow  but  absolutely  reliable  procedure.  The  original  interval  is  divided 
in  half  and  replaced  by  the  half  for  which  the  end  points  are  of  the  opposite 
sign.  The  procedure  is  repeated  until  the  interval  width  decreases  below  a 
given  value. 

This  technique  is  implemented  in  the  class  AuxZero.  It  is  also  employed 
in  several  methods  for  the  computation  of  quantiles  in  StatFunct  in  a  direct 
way,  i.e.,  without  a  call  to  AuxZero.  An  example  for  the  application  of 
AuxZero  is  the  class  ElMaxFike,  see  also  Example  Program  7.1. 

E.3  Interactive  Input  and  Output  Under  Java 

Java  programs  usually  are  event  driven,  i.e.,  while  running  they  react  to  ac¬ 
tions  by  the  user.  Thus  an  interaction  between  user  and  program  is  enabled. 
Its  detailed  design  depends  on  the  problem  at  hand  and  also  on  the  user’s  taste. 
For  our  Example  Programs  four  utility  programs  may  suffice  to  establish  sim¬ 
ple  interactions.  They  are  explained  in  Fig.  E.l. 

It  shows  a  screen  window  produced  by  the  class  DatanFrame.  In  its 
simplest  form  it  consists  only  of  a  frame,  a  title  line  (here  ‘Example  for  the 
creation  of  random  numbers’)  and  an  output  region,  into  which  the  user’s  out¬ 
put  can  be  written.  The  method  DatanFra.add  allows  to  add  additional 
elements  below  the  title  line  which  will  be  arranged  horizontally  starting  from 
the  left.  In  Fig.  E.  1  these  are  an  input  group,  a  radio-button  group  and  a  go 
button.  The  input  group  is  created  by  the  class  AuxJInputGroup  and  the 
radio-button  group  by  AuxJRButtonGroup.  Both  (as  well  as  Datan- 
Fra)  make  use  of  the  standard  Java  Swing  classes.  The  go  button  is 
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examp  Les .  E 1  Random 


Example  for  the  creation  of  random  numbers 


Parameter  input 

Random  number  Generator 

seedjl 

1 ,972.803,000 

Uniform  based  on  MLCG 

■  Uniform  based  on  ECUY 

seed_2 

1,688,879,904 

1 .  Standard  normal  based  on  ECUY 

RandomHumbers  produced  By  method  DatanRandom.  ecuy 

*j 

0.901825506606 

0.444004871877 

0.789999773960 

0,151325385951 

0,00 

0.624595031868 

0.304609253923 

0.446399476439 

0.637773422678 

0.82 

— 

0 . 051502464812 

0 . 205762382617 

0.389121883910 

0.717099411401 

0.99 

0.003964492694 

0.283866810899 

0.387224146009 

0,972274940419 

0,58 

0.941261069428 

0.525620644338 

0.486060901755 

0.501248470666 

0,10 

0.482574259046 

0.135820248607 

0.592146378733 

0.075326922540 

0.14 

0.256402142815 

0.353396510093 

0. 621918055882 

0.718500707790 

0.12 

0.734195493615 

0.223728268917 

0,143575934416 

0,652796411053 

0,21 

0.746644457198 

0.769604061286 

0.992892264443 

0,534757997931 

0,59 

0.051478706772 

0.596956782947 

0.502452941802 

0.761470346512 

0.69 

_ 

.  l - — JlA  %  -TH-^b n  ji  i~i  r- 

\4 

IE! 

Fig.E.l:  A  window  of  the  type  Dat  an  Frame  with  elements  for  interactive  input,  for  starting 
the  program,  and  for  alphanumeric  output  of  results. 


directly  created  by  a  standard  class.  The  input  group  itself  is  composed  of 
an  arbitrary  number  of  number-input  regions,  arranged  vertically  one  below 
the  other,  which  are  created  by  the  class  AuxJNumberlnput.  The  detailed 
usage  of  these  classes  is  summarized  in  the  online  documentation,  we  also 
recommend  to  study  the  source  code  of  some  of  our  Example  Programs. 

E.4  Java  Classes 


AuxDer  computes  the  derivative  of  a  function  using  the  Rutishauser  method. 

AuxDri  computes  the  matrix  A  of  derivatives  required  by  LsqNon  and 
LsqMar. 

AuxGrad  computes  the  gradient  of  a  function  at  a  given  point. 

AuxHesse  computes  the  Hessian  matrix  of  a  function  at  a  given  point. 
AuxZero  finds  the  zero  of  a  monotonic  function. 

DatanFrame  creates  a  screen  window  with  possibilities  for  interactive  in¬ 
put  and  output. 
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AuxJInputGroup  creates  an  input  group  within  a  screen  window. 

AuxJInputNumber  creates  a  number-input  region  within  an  input  group. 

AuxJRButtonGroup  creates  a  radio-button  group  within  a  screen 
window. 


F.  The  Graphics  Class  DatanGraphics 


F.l  Introductory  Remarks 

The  graphical  display  of  data  and  of  curves  of  fitted  functions  has  always 
been  an  important  aid  in  data  analysis.  Here  we  present  the  class  Datan¬ 
Graphics  comprising  methods  which  produce  graphics  in  screen  windows 
and/or  postscript  files.  We  distinguish  control,  transformation,  drawing,  and 
auxilliary  methods.  All  will  be  described  in  detail  in  this  Appendix  and  there 
usage  will  be  explained  in  a  number  of  Example  Programs.  For  many  pur¬ 
poses,  however,  it  is  sufficient  to  use  one  of  only  five  classes  which,  in  turn, 
resort  to  DatanGraphics.  These  classes,  by  a  single  call,  produce  com¬ 
plete  graphical  structures;  they  are  listed  at  the  beginning  of  Sect.  F.8. 

F.2  Graphical  Workstations:  Control  Routines 

As  mentioned,  a  plot  can  be  output  either  as  in  the  form  of  a  screen  window 
or  as  a  file  in  postscript  format.  The  latter  is  easily  embedded  in  digital  doc¬ 
uments  or  directly  printed  on  paper,  if  necessary  after  conversion  to  another 
format  such  as  pdf,  by  the  use  of  a  freely  available  program.  For  histori¬ 
cal  reasons  we  call  both  the  screen  window  and  the  postscript  file  a  graphics 
workstation. 

The  method  DatanGraphics. openWorkstation“opens”  a  screen 
window  or  a  file  or  both,  i.e.,  it  initializes  buffers  into  which  information  is 
written  by  methods  mentioned  later.  Only  after  the  method  DatanGra¬ 
phics. closeWorstation  has  been  called,  is  the  window  presented  on 
the  screen  and/or  is  the  postscript  file  made  available.  In  this  way  several 
graphics  can  be  produced  one  after  another.  There  windows  can  coexist  on 


S.  Brandt,  Data  Analysis:  Statistical  and  Computational  Methods  for  Scientists  and  Engineers , 
DOI  10.1007/978-3-319-03762-2,  ©  Springer  International  Publishing  Switzerland  2014 
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f(x)  =  (2io2)  1/2exp(-(x-a)2/2o2) 


t>  X 


Fig.F.l:  Simple  example  of  a  plot  produced  with  DatanGraphics. 


the  screen.  They  be  changed  in  size  using  the  computer  mouse;  but  their 
contents  is  not  alterable. 

F.3  Coordinate  Systems,  Transformations 
and  Transformation  Methods 

F.3.1  Coordinate  Systems 
World  Coordinates  (WC) 

Figure  F.l  shows  a  plot  made  by  DatanGraphics.  Let  us  imagine  for  a 
moment  that  all  of  the  graphical  structures,  including  text,  physically  exist, 
e.g.,  that  they  are  made  out  of  bent  wire.  The  coordinate  system  in  which  this 
wire  structure  is  described  is  called  the  world  coordinate  system  (WC).  The 
coordinates  of  a  point  in  the  WC  are  denoted  by  (X,  7). 


F.3  Coordinate  Systems,  Transformations  and  Transformation  Methods 
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Computing  Coordinates  (CC) 

If  we  consider  the  axes  in  Fig.  F.  1 ,  we  note  that  the  axes  designated  with  x 
and  y  have  about  the  same  length  in  world  coordinates,  but  have  very  different 
lengths  in  terms  of  the  numbers  shown.  The  figure  shows  a  plot  of  a  function 

v  =  f(x )  . 

Each  point  (x,  y )  appears  at  the  point  ( X ,  Y).  We  call  the  coordinate  system  of 
the  (v,  y)  points  the  computing  coordinate  system  (CC).  The  transformation 
between  WC  and  CC  is  given  by  Eq.  (F.3.1). 


Device  Coordinates  (DC) 

From  the  (fictitious)  world  coordinate  system,  the  plot  must  be  brought  onto 
the  working  surface  of  a  graphics  device  (terminal  screen  or  paper).  We  call 
the  coordinates  (u,  v)  on  this  surface  the  device  coordinates  (DC). 


F.3.2  Linear  Transformations:  Window  -  Viewport 

The  concepts  defined  in  this  section  and  the  individual  transformations  are 
illustrated  in  Fig.  F.2. 

Let  us  assume  that  the  computing  coordinates  in  x  cover  the  range 


Xa  <  X  <  Xb  . 

The  corresponding  range  in  world  coordinates  is 

Xa<X<Xb  . 


A  linear  transformation  x  — ►  X  is  therefore  defined  by 


X  =  Xa  +  (x 


xh-x 


a 


Xb  —  X 


a 


(F.3.1) 


The  transformation  for  y  — ►  Y  is  defined  in  a  corresponding  way.  One  speaks 
of  the  mapping  of  the  window  in  computing  coordinates  CC, 


xa<x  <xb  ,  ya<y  <yb 


(F.3.2) 


onto  the  viewport  in  world  coordinates  WC, 


Xa<X<Xb  , 


Ya<Y  <Yb 


(F.3. 3) 


434 


F  The  Graphics  Class  DatanGraphics 


Fig.F.2:  The  various  coordinate  systems.  Above :  window  in  computing  coordinates.  Middle : 
viewport  ( small  rectangle)  and  window  ( large  rectangle)  in  world  coordinates.  Below :  pre¬ 
liminary  viewport  ( dashed  rectangle ),  final  adjusted  viewport  ( small  rectangle ),  and  border 
of  the  display  surface  ( large  rectangle)  in  device  coordinates.  The  mappings  from  comput¬ 
ing  coordinates  to  world  coordinates  and  from  world  coordinates  to  device  coordinates  are 
indicated  by  dotted  lines. 


The  mapping  is  in  general  distorted ,  i.e.,  a  square  in  CC  becomes  a  rectangle 
in  WC.  It  is  undistorted  only  if  the  aspect  ratios  of  the  window  and  viewport 
are  equal, 


Xb  xa  Xb  X 


a 


yb  ya  Yb  Ya 


(F.3.4) 
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The  user  of  DatanGraphics  computes  values  first  in  CC  that  are  to 
be  plotted.  By  specifying  the  window  and  viewport  he  or  she  defines  the 
mapping  into  WC.  A  possible  linear  distortion  is  perfectly  acceptable  in  this 
step,  since  it  provides  a  simple  means  of  changing  the  scale. 

Next  a  mapping  onto  the  physically  implemented  device  coordinates  DC 
must  be  done.  It  can  of  course  be  defined  by  again  providing  a  window  (in 
WC)  and  a  viewport  (in  DC).  One  wants,  however,  to  avoid  an  additional 
distortion.  We  define,  therefore,  a  viewport 

ua  <u  <Ub  ,  va<v<vt  (F.3.5) 

and  a  window 

X'a<X<X'b  ,  Y'a<Y  <Yb  .  (F.3.6) 

The  mapping  from  the  window  (F.3.6)  onto  the  viewport  (F.3.5)  is  only  car¬ 
ried  out  if  both  have  the  same  aspect  ratio.  Otherwise  the  viewport  (F.3.6)  is 
reduced  symmetrically  in  width  to  the  right  and  left  or  in  height  symmetrically 
above  and  below  such  that  the  viewport 

u'a  <  u  <  u'b  ,  v'a  <  v  <  vb  (F.3.7) 

has  the  same  aspect  ratio  as  the  window  (F.3.6).  In  this  way  a  distortion  free 
mapping  is  defined  between  the  two. 

F.4  Transformation  Methods 

The  user  of  DatanGraphics  must  define  the  transformations  between  the 
various  coordinate  systems  by  calling  the  appropriate  routines.  The  trans¬ 
formations  are  then  applied  when  drawing  a  graphical  structure  without  any 
further  intervention. 


Transformation  CC  — ►  WC 

This  transformation  is  defined  by  calling  the  following  two  methods.  Datan 
Graphics. setWindowInComputingCoordinates  sets  the  window 
in  computing  coodinates,  Datan 
Coordinates  sets  the  viewport  in  world  coordinates. 


Transformation  WC  — ►  DC 

This  transformation  is  defined  by  calling  the  following  two  methods.  Datan¬ 
Graphics. setWindowInWorldCoordinates  sets  the  window 
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in  world  coordinates.  DatanGraphics. setFormat  defines  the  tempo¬ 
rary  viewport  in  device  coordinates.  Into  that  the  final  view  port  is  fitted  so 
that  the  width-to-height  ratio  is  the  same  as  for  the  window  in  world  coordi¬ 
nates.  If  DatanGraphics. setFormat  is  not  called  at  all,  the  format  is 
taken  to  be  A5  landscape.  If  the  workstation  is  a  screen  window,  then  only 
the  width-to-height  ratio  is  taken  into  account.  In  the  case  of  a  postscript  file 
the  absolute  size  in  centimeters  is  valid  only  if  a  plot  of  that  size  will  fit  on 
the  paper  in  the  printer.  Otherwise  the  plot  is  demagnified  until  it  just  fits. 
In  both  cases  the  plot  is  centered  on  the  paper.  A  call  to  the  method  Datan¬ 
Graphics. setStandardPaperSizeAndBorders  informs  the  pro¬ 
gram  about  the  paper  size.  If  it  is  not  called,  then  A4  is  assumed  with  a 
margin  of  5  mm  on  all  4  sides. 

In  most  cases,  having  defined  the  transformations  in  this  way,  the  user 
will  be  interested  only  in  computing  coordinates. 


Clipping 

The  graphical  structures  are  not  completely  drawn  under  certain  circum¬ 
stances.  They  are  truncated  if  they  extend  past  the  boundary  of  the  so-called 
clipping  region.  The  structures  are  said  to  be  clipped.  For  polylines,  markers, 
data  points,  and  contour  lines  (Sect.  F.5)  the  clipping  region  is  the  window 
in  computing  coordinates;  for  text  and  graphics  utility  structures  the  clipping 
region  is  the  window  in  world  coordinates.  These  regions  can  can  be  set 
explicitely  using  DatanGraphics. setSmallClippingWindow  and 
DatanGraphics. setBigClippingWindow,  respectively. 


F.5  Drawing  Methods 

Colors  and  Line  Widths 

The  methods  mentioned  up  to  this  point  carry  out  organizational  tasks,  but 
do  not,  however,  produce  any  graphical  structures  on  the  workstation.  All 
of  the  graphical  structures  created  by  DatanGraphics  consist  of  lines. 

The  lines  possess  a  given  color  and  width.  The  selection  of  these  two  at¬ 
tributes  is  done  as  follows.  A  pair  of  properties  (color,  linewidth)  is  assigned 
to  each  of  set  of  8  color  indices.  The  set  of  8  is  different  for  screen  window 
and  postscript  file.  With  the  method  DatanGraphics. chooseColor  the 
user  selects  one  particular  color  index.  That  then  is  valid  until  another  color 
index  is  chosen.  The  user  may  assign  his  own  choice  of  color  and  line  width 
to  a  color  index  by  the  methods  DatanGraphics. setScreenColor 
and/or  DatanGraphics. setPSColor  For  the  background  of  the  screen 
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window  (the  standard  is  blue)  another  color  can  be  chosen  with  Datan- 
Graphics.setScreenBackground.  For  the  postscript  file  the  back¬ 
ground  is  always  transparent. 


Polylines 

The  concepts  of  a  polyline  and  polymarker  have  been  introduced  for  par¬ 
ticularly  simple  graphics  structures.  A  polyline  defines  a  sequence  of  line 
segments  from  the  point  (x\ ,  yi)  through  the  points  (x2,yi),  (*3,  >’3),  ...  to 
the  point  (xn ,  yn ) .  A  polyline  is  drwan  with  the  method  DatanGraph- 
ics. Polyline. 

A  polymarker  marks  a  plotting  point  with  a  graphical  symbol.  The  poly¬ 
markers  available  in  DatanGraphics  are  shown  in  Fig.  F.3. 
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Fig.F.3:  Polymarkers. 

One  can  clearly  achieve  an  arbitrarily  good  approximation  of  any  graph¬ 
ical  structure  by  means  of  polylines.  For  example,  graphs  of  functions  can  be 
displayed  with  polylines  as  long  as  the  individual  points  are  sufficiently  close 
to  each  other. 

Sometimes  one  wants  to  draw  a  polyline  not  as  a  solid  line  but  rather 
dashed ,  dotted ,  or  dot-dashed.  That  is  achieved  by  the  method  Datan¬ 
Graphics  .  dr  a  wB  rokenPoly  line . 

Polymarkers  are  especially  suitable  for  marking  data  points.  If  a  data 
point  has  error  bars,  then  one  would  like  to  indicate  these  errors  in  one  or 
both  coordinates  by  means  of  error  bars.  In  certain  circumstances  one  would 
even  like  to  show  the  complete  covariance  ellipse.  This  task  is  performed  by 
the  method  DatanGraphics. drawDatapoint.  Examples  are  shown  in 
Fig.  F.4. 

An  error  bar  in  the  x  direction  is  only  drawn  if  ox  >  0,  and  in  the  y 
direction  only  if  oy  >  0.  The  covariance  ellipse  is  only  draw  if  ox  >  0,  oy  >  0, 
and  cov(.v\  y)  f  0.  Error  bars  are  not  drawn  if  they  would  lie  completely 
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Fig.F.4:  Example  for  plotting  data  points. 


within  the  polymarker  itself.  If  part  of  the  structure  falls  outside  of  the  CC 
window  (F.3.2),  then  this  part  is  not  drawn. 


Histogram 

The  method  DatanGraphics. drawHistogram  displays  data  in  the  form 
of  a  histogram. 


Contour  Lines 

A  function  /  =  f(x,y )  defines  a  surface  in  a  three  dimensional  ( x,y,f ) 
space.  One  can  also  get  an  idea  of  the  function  in  the  (x,  y)  plane  by  marking 
points  for  which  f(x,y )  is  equal  to  a  given  constant  c.  The  set  of  all  such 
points  forms  the  contour  line  f(x,y )  =  c.  By  drawing  a  set  of  such  contour 
lines  f(x,y )  =  c\,C2, ...  one  obtains  (as  with  a  good  topographical  map)  a 
rather  good  impression  of  the  function. 

Naturally  it  is  impossible  to  compute  the  function  for  all  points  in  the 
(x,  y)  plane.  We  restrict  ourselves  to  a  rectangular  region  in  the  (x,  y)  plane, 
usually  the  window  in  computing  coordinates,  and  we  break  it  into  a  total  of 


F.6  Utility  Methods 


439 


N  =  ti  x n y  smaller  rectangles.  The  comer  points  of  these  smaller  rectangles 
have  x  coordinates  that  are  neighboring  values  in  the  sequence 

xq,xq-\-  Ax,xq-\-2Ax,  . . .  ,XQ-\-nxAx 
The  y  coordinates  of  the  comer  points  are  adjacent  points  of  the  sequence 

yo,yo  +  Ay,yo  +  2Ay,...,yo  +  nyAy  . 

In  each  rectangle  the  contour  line  is  approximated  linearly.  To  do  this  one 
considers  the  function  f(x,  y)  —  c  at  the  four  corner  points.  If  the  function  is 
of  a  different  sign  at  two  corner  points  that  are  the  end  points  of  an  edge  of 
the  rectangle,  then  it  is  assumed  that  the  contour  lines  intersect  the  edge.  The 
intersection  point  is  computed  with  linear  interpolation.  If  the  intersection 
points  are  on  two  edges  of  the  rectangle,  then  they  are  joined  by  a  line  seg¬ 
ment.  If  there  are  intersection  points  on  more  than  two  edges,  then  all  pairs  of 
such  points  are  joined  by  line  segments. 

Clearly  the  approximation  of  the  contour  lines  by  line  segments  becomes 
better  for  a  finer  division  into  small  rectangles.  With  a  finer  division,  of 
course,  the  required  computing  time  also  becomes  longer. 

The  method  DatanGraphics.drawContour  computes  and  draws  a 
contour  line.  An  example  of  a  function  represented  by  contour  lines  is  shown 
in  Fig.  F.5. 

F.6  Utility  Methods 

With  the  few  methods  described  up  to  this  point,  a  great  variety  of  complicated 
plots  can  be  produced.  By  using  the  methods  of  this  section  and  the  next,  the 
tasks  are  made  easier  for  the  user,  since  they  help  to  create  graphical  structures 
typically  used  in  conjunction  with  the  plots  of  data  analysis,  such  as  axes, 
coordinate  crosses,  and  explanatory  text. 


Frames 

The  methode  DataGrpahics.drawFrame  draws  a  frame  around  the 
plotted  part  of  world  coordinate  system,  i.e.,  the  outer  frame  of  the  plots  repro¬ 
duced  here.  The  metod  DatanGraphics.drawBoundary,  on  the  other 
hand,  draws  a  frame  around  the  window  of  the  cumputing  coordinate  system. 


Scales 

The  method  DatanGraphics.drawScaleX  draws  a  scale  in  x  direction. 

Ticks  appear  at  the  upper  and  lower  edge  of  the  CC  window  pointing  to  the 
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Fig.F.5:  Contour  lines  f{x,y)  =  —0.9,  —0.8, . . . ,  0.8, 0.9  of  the  function  f(x)  =  sin(x  +  y ) 
cosflx  —  y]/ 2)  in  the  (x,  y)  plane. 


inside  of  the  window.  Below  the  lower  edge  numbers  appear,  marking  some 
of  these  ticks.  It  is  recommended  to  first  call  the  method  DatanGraph- 
ics.drawBoundary,  to  mark  the  edges  themselves  by  lines.  In  addition 
an  arrow  with  text  can  be  drawn,  showing  in  the  direction  of  increasing  x 
values.  The  method  DatanGraphics. drawScaleY  performs  analogous 
tasks  for  an  axis  in  y  direction. 

The  creation  of  axis  divisions  and  labels  with  these  methods  is  usually 
done  automatically  without  intervention  of  the  user.  Sometimes,  however,  the 
user  will  want  to  influence  these  operations.  This  can  be  done  by  using  the 
method  DatanGraphics. setParametersForScale.  A  call  to  this 
method  influences  only  that  scale  which  is  generated  by  the  very  next  call  of 
DatanGraphics.drawScaleX  or  DatanGraphics.drawScaleY, 
respectively. 


Coordinate  Cross 

The  method  DatanGraphics. drawCoordinateCross  draws  a  coor¬ 
dinate  cross  in  the  computing  coordinate  system.  The  axes  of  that  system 
appear  as  broken  lines  inside  the  CC  window. 
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F.7  Text  Within  the  Plot 

Explanatory  text  makes  plots  easier  to  understand.  The  methods  in  this  section 
create  text  superimposed  on  a  plot  which  can  be  placed  at  any  location. 

The  text  must  be  supplied  by  the  user  as  a  character  string.  Before  this 
text  is  translated  into  graphics  characters,  however,  it  must  first  be  encoded. 
The  simple  encoding  system  used  here  allows  the  user  to  display  simple  math¬ 
ematical  formulas.  For  this  there  are  three  character  sets :  Roman,  Greek,  and 
mathematics,  as  shown  in  Table  F.l.  The  character  set  is  selected  by  control 
characters.  These  are  the  special  characters 

@  for  Roman, 

&  for  Greek, 

%  for  mathematics. 

A  control  character  in  the  text  string  causes  all  of  the  following  characters 
to  be  produced  with  the  corresponding  character  set,  until  another  control 
symbol  appears.  The  default  character  set  is  Roman. 

In  addition  there  exist  the  following  positioning  symbols: 

A  for  superscript  (exponent), 

_  for  subscript  (index), 

#  for  normal  height, 

"  for  backspace. 

All  characters  appear  at  normal  height  as  long  as  no  positioning  symbol  has 
appeared.  One  can  move  a  maximum  of  two  steps  from  normal  height,  e.g., 
Aa„ ,  Aap .  The  positioning  symbols  "  and  _  remain  in  effect  until  the  appear¬ 
ance  of  a  #.  The  symbol  "  acts  only  on  the  character  following  it.  This  then 
appears  over  the  previous  character  instead  of  after  it.  In  this  way  one  obtains, 

e.g.,  A„  instead  of  A„  . 

The  method  DatanGraphics.drawCaption  draws  a  caption,  cen¬ 
tered  slightly  below  the  upper  edge  of  the  plotted  section  of  the  world  coordi¬ 
nate  system. 

Sometimes  the  user  wants  to  write  text  at  a  certain  place  in  the  plot,  e.g., 
next  to  an  individual  curve  or  data  point,  and  also  to  choose  the  text  size.  This 
is  made  possible  by  the  method  DatanGraphics.drawText . 


F.8  Java  Classes  and  Example  Programs 

Java  Classes  Poducing  Graphics 

DatanGraphics  contains  the  methods  mentioned  in  this  Appendix. 
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Table  F.l:  The  various  character  sets  for  producing  text. 


Control  characters 

Control  characters 

Roman 

Greek 

Math 

Roman 

Greek 

Math 

Input 

@ 

& 

% 

Input 

@ 

& 

% 

A 

A 

A(ALPHA) 

A 

a 

a 

a  (alpha) 

a 

B 

B 

B(BETA) 

B 

b 

b 

/3(beta) 

b 

C 

C 

X(CHI) 

n 

c 

c 

X  (chi) 

c 

D 

D 

A  (DELTA) 

A 

d 

d 

8  (delta) 

d 

E 

E 

E(EPSILON) 

E 

e 

e 

6  (epsilon) 

e 

F 

F 

0(PHI) 

F 

f 

f 

<P( phi) 

f 

G 

G 

r  (GAMMA) 

g 

g 

y  (gamma) 

g 

H 

H 

H(ETA) 

H 

h 

h 

77  (eta) 

h 

I 

I 

I(IOTA) 

/ 

■ 

1 

i 

biota) 

i 

3 

J 

I(IOTA) 

J 

■ 

j 

j 

biota) 

j 

K 

K 

K(KAPPA) 

K 

k 

k 

k  (kappa) 

k 

L 

L 

A(LAMBDA) 

| 

1 

1 

A  (lambda) 

1 

M 

M 

M(MU) 

± 

m 

m 

/x(mu) 

m 

N 

N 

N(NU) 

N 

n 

n 

v(nu) 

n 

0 

0 

Q  (OMEGA) 

0 

0 

0 

u;(omega) 

0 

P 

P 

n  (Pi) 

0 

p 

P 

7T(pi) 

P 

Q 

Q 

0  (THETA) 

Q 

q 

q 

ft  (theta) 

q 

R 

R 

R(RHO) 

O 

r 

r 

p(rho) 

r 

S 

S 

X  (SIGMA) 

8 

s 

s 

a  (sigma) 

s 

T 

T 

T(TAU) 

i 

i 

t 

t 

r(tau) 

t 

U 

U 

O(OMICRON) 

U 

u 

u 

o(omicron) 

ii 

V 

V 

U 

V 

V 

V 

w 

w 

^(PSI) 

V 

w 

w 

VKpsi) 

w 

X 

X 

S(Xl) 

X 

X 

X 

l(xi) 

X 

Y 

Y 

T  (UPSILON) 

o 

A 

y 

y 

u(upsilon) 

y 

z 

Z 

Z(ZETA) 

Z 

Z 

z 

f  (zeta) 

z 

/V 

i 

t 

t 

t 

— 

— 

— 

— 

s 

$ 

$ 

$ 

{ 

{ 

{ 

{ 

4\ 

* 

# 

X 

} 

} 

} 

} 

( 

( 

t 

1 

1 

| 

1 

) 

) 

-> 

[ 

[ 

& 

[ 

+ 

+ 

+ 

+ 

] 

] 

@ 

] 

t 

4 

\ 

1 

1 

1 

1 

• 

• 

• 

2 

2 

2 

2 

9 

9 

3 

3 

3 

3 

4 

9 

A 

4 

4 

4 

< 

< 

C 

< 

i 

i 

i 

5 

5 

5 

5 

> 

> 

D 

> 

6 

6 

6 

6 

? 

? 

§ 

7 

7 

7 

7 

9 

9 

8 

8 

8 

8 

■ 

• 

• 

• 

9 

9 

9 

9 

/ 

/ 

\ 

% 

0 

0 

0 

0 
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GraphicsWithHistogram  produces  a  complete  plot  with  a  histogram 
(an  Example  Program  is  E2Sample). 

GraphicsWith2DScatterDiagram  produces  a  complete  plot  with  a 

two-dimensional  scatter  diagram  (an  Example  Program  is  E3 Sample). 

GraphicsWithHistogramAndPolylin  eproduces  a  complete  graph¬ 
ics  with  a  histogram  and  a  polyline  (an  Example  Program  isE6Gr). 

GraphicsWithDataPointsAndPolyline  produces  a  complete  plot 
with  data  points  and  one  polyline  (an  Example  Program 

GraphicsWithDataPointAndMultiplePolylines  produces  a 
complete  plot  with  data  points  and  several  polylines  (an 
gram  is  E8Gr). 

Example  Program  F.l:  The  class  ElGr  demonstrates  the  use 
following  methods  of  the  class  DatanGraphics: 
openWorkstation,  closeWorkstation, 
setWindowInComputingCoordinates, 
setV  iewportlnW  orldCoordinates , 
setWindowInWorldCoordinates,  setFormat, 
drawFrame,  drawBoundary,  chooseColor, 
drawPolyline,  drawBrokenPolyline,  drawScaleX, 
drawScaleY,  drawCaption,  drawText 

The  program  generates  the  simple  plot  of  Fig. F.l.  It  opens  the  workstation  and 
defines  the  different  coordinate  systems.  The  outer  frame  is  drawn  (enclosing  the 
section  of  the  world  coordinate  system  to  be  displayed)  and  the  inner  frame  (the 
boudary  of  the  computing  coordinate  system).  Next,  the  lettered  scales  for  abswcissa 
and  ordinate  are  produced  as  is  a  caption  for  the  plot.  Now,  the  color  index  is 
changed.  In  a  short  loop  a  total  of  201  coordinate  pairs  (.v, .  y, )  are  computed  with 
=  —10,  —9.9,  —9.8,  . . . ,  10  and  v,  =  fix,).  The  function  fix)  is  the  probability 
density  of  the  standardized  normal  distribution.  A  polyline,  defined  by  these  pairs 
is  drawn.  In  a  second  loop  the  points  for  a  polyline  are  computed  which  correspond 
to  a  normal  distribution  with  with  mean  a  =  2  and  standard  deviation  a  =  3.  That 
polyline  is  represented  as  a  broken  line.  Finally,  two  short  straight  lines  are  displayed 
in  the  upper  left  corner  of  the  plot  (one  as  a  solid  line  and  one  as  a  dashed  line).  To 
the  right  each  of  these  polylines  a  short  text  is  displayed,  indicating  the  parameters  of 
the  Gaussians  displayed  as  solid  and  dashed  curves,  respectively.  Before  termination 
of  the  program  the  workstation  is  closed. 

Example  Program  F.2:  The  class  E2Gr  demonstrates  the  use  of  the 
method  DatanGraphics. drawMark. 

The  short  program  generates  the  plot  of  Fig.  F.3,  showing  the  different  polymarkers, 
which  can  be  drawn  with  DatanGraphics.drawMark. 


is  E7Gr). 
Example  Pro¬ 
of  the 
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Fig.F.6:  Four  versions  of  the  same  plot  with  different  types  of  scales. 


Example  Program  F.3:  The  class  E3Gr  demonstrates  the  use  of  the 
method  DatanGraphics.drawDataPoint. 

The  program  produces  the  plot  of  Fig.  F.4,  which  contains  examples  for  the  different 
ways  to  present  data  points  with  errors. 

Example  Program  F.4:  The  class  E4Gr  demonstrates  the  use  of  the 
method  DatanGrpahics.drawContour 

A  window  of  computing  coordinates  —  n  <  x  <  n,  —n  <  y  <  n  and  a  square 
viewport  in  world  coordinates  are  selected.  After  creating  scales  and  the  caption, 
input  parameters  for  DatanGraphics. drawContour  are  prepared.  Next  by 
successive  calls  of  of  this  method  in  a  loop,  contours  of  the  function  f(x,y )  = 
sin(A  +  y )  cos((x  —  y) /2)  are  drawn.  The  result  is  a  plot  corresponding  to  Fig.  F.5. 

Suggestions:  Extend  the  program  such  that  the  parameters  ncontandnstep 
defining  the  number  of  contours  and  the  number  of  intervals  in  x  and  y  can  be  set 
interactively  by  the  user.  Study  the  changes  in  the  plot  resulting  from  very  small 
values  of  nstep. 

Example  Program  F.5:  The  class  E5Gr  demonstrates  the  methods 
DatanGraphics. setParametersForSale 
and  DatanGraphics. drawCoordinateCross 

The  program  generates  the  plots  shown  in  Fig.  F.6.  It  contains  four  plots  which  differ 
only  by  the  design  of  their  scales.  The  plots  are  generated  in  a  loop  where  different 
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P(k) 

A 


0  5  10  15  20  25 


t>  k 


Fig.  F.7:  A  plot  generated  with  Graphics WithHistogramAndPoly line  containing  a 
histogram  and  a  polyline. 


viewports  in  world  coordinates  are  chosen  in  each  step,  so  that  the  plots  correspond 
to  the  upper-left,  upper-right,  lower-left,  and  lower-right  quadrant  of  the  window  in 
world  coordinates.  For  the  upper-left  plot  the  default  values  for  the  scale  design  are 
used.  In  the  upper-right  plot  the  number  of  ticks  and  the  lettering  of  the  scale  is  prede¬ 
fined.  In  the  lower-left  plot  the  size  of  the  symbols  used  in  the  lettering  of  the  scales 
is  changed.  In  the  lower-right  plot  the  numbers  are  written  in  exponential  notation. 
All  plots  contain  a  coordinate  cross,  which  is  generated  by  calling  DatanGraph- 
ics.drawCoordinateCross  and  a  curve  corresponding  to  a  Gaussian. 

Example  Program  F.6:  The  class  E6Gr  demonstrates  the  use  of  the  class 
GraphicsWithHistogramAndPolyline 

The  program  first  sets  up  a  histogram  which  for  each  bin  k  contains  the  Poisson 
probability  f(k\  A)  for  the  parameter  A  =  10.  Next,  points  on  a  polyline  are  computed 
corresponding  to  the  probability  density  of  a  normal  distribution  with  mean  A  and 
variance  A.  Finally  the  text  strings  for  the  plot  are  defined  and  the  complete  plot  is 
displayed  by  a  call  of  GraphicsWithHistogramAndPolyline  (Fig.  F.7). 

Example  Program  F.7:  The  class  E7Gr  demonstrates  the  use  of  the  class 
GraphicsWithDataPointsAndPolyline 

First,  by  calling  DatanRandom.line,  data  points  are  generated  which  lie  on  a 
straight  line  y  =  at  +  b  within  the  simulated  errors.  Next,  the  errors  to  be  presented 
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y  =  at  +  b 


>  t 


Fig.F.8:  A  plot  with  data  points  and  a  polyline  generated  by  calling  Graphics  With 
DataPointsAndPolyline . 


in  the  directions  of  the  horizontal  and  vertical  axes  and  their  covariance  are  defined. 
The  latter  two  quantities  in  our  example  are  equal  to  zero.  The  polyline  defining 
the  straight  line  consists  of  only  two  points.  Their  computation  is  trivial.  After  the 
definition  of  the  axis  labels  and  the  caption,  the  plot  is  displayed  by  calling  Graph- 
icsWithDataPointsAndPolyline  (Fig.  F.8). 

Example  Program  F.8:  The  class  E8Gr  demonstrates  the  use  of  the  class 
GraphicsWithDataPointsAndMultiplePolylines 

The  program  generates  21  data  points  which  lie  within  the  simulated  errors  on 
a  Gaussian  curve  with  zero  mean  and  standard  deviation  cr  =  1,  and  which  span  the 
abscissa  region  —  3  <  x  <  3.  Next,  points  on  three  polylines  are  computed  corre¬ 
sponding  to  Gaussian  curves  with  means  of  zero  and  standard  deviations  cr  =  0.5, 
a  —  1,  and  a  —  1.5.  The  polylines  span  the  abscissa  region  — 10  <  x  <  10.  They  are 
displayed  in  different  colors.  One  polyline  is  shown  as  a  continuous  line,  the  other 
two  as  dashed  lines.  Three  plots  are  produced:  The  first  displays  only  the  data  points, 
the  second  only  the  polylines,  and  the  third  shows  the  data  points  together  with  the 
polylines.  In  this  way  the  automatic  choice  of  the  scales  in  the  different  cases  is 
demonstrated. 


G.  Problems,  Hints  and  Solutions, 
and  Programming  Problems 


G.l  Problems 

Problem  2.1:  Determination  of  Probabilities 
through  Symmetry  Considerations 

There  are  n  students  in  a  classroom.  What  is  the  probability  for  the  fact  that  at 
least  two  of  them  have  their  birthday  on  the  same  day?  Solve  the  problem  by  working 
through  the  following  questions: 

(a)  What  is  the  number  N  of  possibilities  to  distribute  the  n  birthdays  over  the 
year  (365  days)? 

(b)  How  large  is  the  number  N'  of  possibilities  for  which  all  n  birthdays  are  dif¬ 
ferent? 

(c)  How  large  then  is  the  probability  Pdiff  that  the  birthdays  are  different? 

(d)  How  large  finally  is  the  probability  P  that  at  least  two  birthdays  are  not  dif¬ 
ferent? 

Problem  2.2:  Probability  for  Non-exclusive  Events 

The  probabilities  P(A),  P(B ),  and  P(AB)  ^  0  for  non-exclusive  events  A  and  B  are 
given.  How  large  is  the  probability  P(A  +  B)  for  the  observation  of  A  or  B1  As  an 
example  compute  the  probability  that  a  playing  card  which  was  drawn  at  random  out 
of  a  deck  of  52  cards  is  either  an  ace  or  a  diamond. 

Problem  2.3:  Dependent  and  Independent  Events 

Are  the  events  A  and  B  that  a  playing  card  out  of  a  deck  is  an  ace  or  a  diamond 
independent 

(a)  If  an  ordinary  deck  of  52  cards  is  used, 

(b)  If  a  joker  is  added  to  the  deck? 


S.  Brandt,  Data  Analysis:  Statistical  and  Computational  Methods  for  Scientists  and  Engineers , 
DOI  10.1007/978-3-319-03762-2,  ©  Springer  International  Publishing  Switzerland  2014 


447 


448 


G  Exercises,  Hints  and  Solutions,  Programming  Problems 


Problem  2.4:  Complementary  Events 

Show  that  A  and  B  are  independent  if  A  and  B  are  independent.  Use  the  result  of 
Problem  2.2  to  express  P(AB)  by  P(A),  P(B),  and  P(AB). 

Problem  2.5:  Probabilities  Drawn  from  Large  and  Small  Populations 

A  container  holds  a  large  number  (>  1000)  of  coins.  They  are  divided  into  three  types 
A,  B ,  and  C,  which  make  up  20,  30,  and  50%  of  the  total. 

(a)  What  are  the  probabilities  P(A),  P(B ),  P(C)  of  picking  a  coin  of  type  A, 
B,  or  C  if  one  coin  is  taken  at  random?  What  are  the  probabilities  P(AB ), 
P(AC),  P(BC ),  P(AA),  P(BB ),  P(CC),  P(2  identical  coins),  P(2  different 
coins)  for  picking  2  coins? 

(b)  What  are  the  probabilities  if  10  coins  (2  of  type  A,  3  of  type  B,  and  5  of  type 
C)  are  in  the  container? 


Problem  3.1:  Mean,  Variance,  and  Skewness  of  a  Discrete  Distribution 

The  throwing  of  a  die  yields  as  possible  results  x\  —  1, 2, . . . ,  6.  For  an  ideally  sym¬ 
metric  die  one  has  pt  =  P(xz)  —  1/6,  i  =  1, 2, . . . ,  6.  Determine  the  expectation  value 
V,  the  variance  a2(x)  —  /X2,  and  the  skewness  y  of  the  distribution, 


(a)  For  an  ideally  symmetric  die, 

(b)  For  a  die  with 


P\  = 

1 

6  ’ 

P2  = 

1 

12  ’ 

P3 

Pa  = 

1 

6  ’ 

P5  = 

3 

12  ’ 

P6 

1 

12 

3 

12 


Problem  3.2:  Mean,  Mode,  Median,  and  Variance 
of  a  Continuous  Distribution 

Consider  the  probability  density  f(x)  of  a  triangular  distribution  of  the  form  shown 
in  Fig.  G.l,  given  by 


/(*) 

/(*) 

/W 


0  ,  x  <  a  , 

2 

- (v  —  a) 

(b  —  a)(c  —  a) 


2 


(b  —  a)  (b  —  c ) 


(b  —  x) 


x  >b  , 

,  a  <  x  <  c 

,  c  <  x  <  b 


Determine  the  mean  V,  the  mode  vm,  the  median  jto.5,  and  the  variance  a2  of  the 
distribution.  For  simplicity  choose  c  —  0  (which  corresponds  to  the  substitution  of  v 
by  x'  —  x  —  c ).  Give  explicit  results  for  the  symmetric  case  a  =  —b  and  for  the  case 
a  =  —2b. 
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Fig. G.l:  Triangular  probability  density. 


Problem  3.3:  Transformation  of  a  Single  Variable 
In  Appendix  D  it  is  shown  that 

/OO 

exp(— x2/2)dx  —  \p2ji 

-oo 

Use  the  transformation  y  =  x/cr  to  show  that 

/oo 

exp(— x1  /2a2)  dx  =  oVln 

-oo 


Problem  3.4:  Transformation  of  Several  Variables 
A  “normal  distribution  of  two  variables”  (see  Sect.  5.10)  can  take  the  form 


f(x9y)  =  - - exp 

2jiaxoy 


(a)  Determine  the  marginal  probability  densities  /(x),  f(y)  by  using  the  results 
of  Problem  3.3. 


(b)  Are  x  and  y  independent? 
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(c)  Transform  the  distribution  f(x,y )  to  the  variables 

u  =  vcos0  +  ysin0  ,  v  =  ycos0  —  vsin0  . 

(The  u,  v  coordinate  system  has  the  same  origin  as  the  x,  y  coordinate  system, 
but  it  is  rotated  with  respect  to  the  latter  by  an  angle  0.) 

Hint:  Show  that  the  transformation  is  orthogonal  and  use  (3.8.12). 

(d)  Show  that  u  and  v  are  independent  variables  only  if  0  =  0°,  90°,  180°,  270° 
or  if  gx  —  cjy  —  a . 

(e)  Consider  the  case  ox  =  oy  —  a,  i.e., 

1  T  1 

f(x)  =  - - rexp  -  —  (x2  +  y2) 

2naA  |_  2oA 

Transform  the  distribution  to  polar  coordinates  r,  0,  determine  the  marginal 
probability  densities  g(r)  and  g(0)  and  show  that  r  and  0  are  independent. 


Problem  3.5:  Error  Propagation 

The  period  T  of  a  pendulum  is  given  by  T  —  2n  y/t/g.  Here  t  is  the  length  of 
the  pendulum  and  g  is  the  gravitational  acceleration.  Compute  g  and  Ag  using  the 
measured  values  t  —  99.8  cm,  At  —  0.3  cm,  T  —  2.03  s ,  AT  =  0.05  s  and  assuming 
that  the  measurements  of  t  and  T  are  uncorrelated. 

Problem  3.6:  Covariance  and  Correlation 

We  denote  the  mass  and  the  velocity  of  an  object  by  m  and  v  and  their  measurement 
errors  by  Am  =  and  Av  =  yj cr2(v ).  The  measurements  are  assumed  to  be 

independent,  i.e.,  cov(m,  u)  =  0.  Furthermore,  the  relative  errors  of  measurement  are 
known,  i.e., 

Am/m  =  a  ,  Av /v  =  b 

(a)  Consider  the  momentum  p  =  m v  and  the  kinetic  energy  E  =  ^m v2  of  the 
object  and  compute  cr2(p ),  <?2(E),  co \(p,E),  and  the  correlation  p(p,E). 
Discuss  p(p,E)  for  the  special  cases  a  =  0  and  b  =  0.  Hint:  Form  the  vectors 
x  =  (m,v)  and  y  =  (p,  E).  Then  approximate  y  =  y(x)  by  a  linear  transfor¬ 
mation  and  finally  compute  the  covariance  matrix. 

(b)  For  the  case  where  the  measured  values  of  E,  p  and  the  covariance  matrix  are 
known,  compute  the  mass  m  and  its  error  by  error  propagation.  Use  the  results 
from  (a)  to  verify  your  result.  Note  that  you  will  obtain  the  correct  result  only 
if  co \(p,  E)  is  taken  into  account  in  the  error  propagation. 
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Problem  5.1:  Binomial  Distribution 


(a)  Prove  the  recursion  formula 


K+i 


n  —  kp 

- ~K 

k  +  1  q 


(b)  It  may  be  known  for  a  certain  production  process  that  a  fraction  q  =  0.2  of 
all  pieces  produced  are  defective.  This  means  that  in  5  pieces  produced  the 
expected  number  of  non-defective  pieces  isnp  =  n(l  —  q)  =5-  0.8  =  4.  What 
is  the  probability  P 2  and  the  probability  P3  that  at  most  2  or  at  most  3  pieces 
are  free  from  defects?  Use  relation  (a)  to  simplify  the  calculation. 

(c)  Determine  the  value  km  for  which  the  binomial  distribution  is  maximum,  i.e., 
km  is  the  most  probable  value  of  the  distribution.  Hint:  Since  W£  is  not  a 
function  of  a  continuous  variable  k ,  the  maximum  cannot  be  found  by  looking 
for  a  zero  in  the  derivative.  Therefore,  one  has  to  study  finite  differences 

wnk-wu 

(d)  In  Sect.  5.1  the  binomial  distribution  was  constructed  by  considering  the  ran¬ 
dom  variable  X  =  Y^=\  xz  -  Here  X,  was  a  random  variable  that  took  only  the 
values  0  and  1  with  the  probabilities  P(X;  =  1)  =  p  and  P(X;  =  0)  =  q. 

The  binomial  distribution 


/(*)  =  /(*)  =  K 


pkq 


n—k 


was  then  obtained  by  considering  in  detail  the  probability  to  have  k  cases  of 
xt  =  1  in  a  total  of  n  observations  of  the  variable  x;  .  Obtain  the  binomial 
distribution  in  a  more  formal  way  by  constructing  the  characteristic  function 
cpXj  of  the  variable  xt.  From  the  nth  power  of  cpXi  you  obtain  the  characteristic 
function  of  x.  Hint:  Use  the  binomial  theorem  (B.6). 


Problem  5.2:  Poisson  Distribution 

In  a  certain  hospital  the  doctor  on  duty  is  called  on  the  average  three  times  per  night. 
The  number  of  calls  may  be  considered  to  be  Poisson  distributed.  What  is  the  proba¬ 
bility  for  the  doctor  to  have  a  completely  quiet  night? 

Problem  5.3:  Normal  Distribution 

The  resistance  R  of  electrical  resistors  produced  by  a  particular  machine  may  be 
described  by  a  normal  distribution  with  mean  Rm  and  standard  deviation  a . 

The  production  cost  for  one  resistor  is  C,  and  the  price  is  5C  if  R  —  Rq  d=  A\, 
and  2C  if  Rq  —  <  R  <  Ro  —  A\  or  Rq  +  A\  <  R  <  Ro  +  A2.  Resistors  outside 

these  limits  cannot  be  sold. 

(a)  Determine  the  profit  P  per  resistor  produced  for  Rm  —  Rq,  A\  —  <J\Rq ,  A2  = 
CI2R0,  &  =bRo.  Use  the  distribution  function  t/t0. 
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(b)  Use  Table  1.2  to  compute  the  numerical  values  of  P  for  <21  =  0.01,  <22  =  0.05, 
b  =  0.05. 

(c)  Show  that  the  probability  density  (5.7.1)  has  points  of  inflection  (i.e.,  second 
derivative  equal  to  zero)  at  x  =  a  ±b. 


Problem  5.4:  Multivariate  Normal  Distribution 

A  planar  xy  coordinate  system  is  used  as  target.  The  probability  to  observe  a  hit  in 
the  plane  may  be  given  by  the  normal  distribution  of  Problem  3.4  (e).  Use  the  result 
of  that  problem  to  determine 

(a)  The  probability  P(R),  to  observe  a  hit  within  a  given  radius  R  around  the 
origin, 


(b)  The  radius  R ,  within  which  a  hit  is  observed  with  a  given  probability.  Com¬ 
pute  as  a  numerical  example  the  value  of  R  for  P  =  90  %  and  a  —  1 . 


Problem  5.5:  Convolution 

(a)  Prove  the  relation  (5.11.11).  Begin  with  (5.11.9)  and  use  the  expression 
(5.11.10)  for  fy(y).  In  the  intervals  0  <  u  <  1  and  2  <  u  <  3  relations 
(5.11.10a)  and  (5.11.10b)  hold,  since  in  these  intervals  one  always  has  y  <  1 
and  y  >  1,  respectively.  In  the  interval  1  <  u  <  2  the  resulting  distribution 
f(u )  must  be  constructed  as  sum  of  two  integrals  of  the  type  (5.11.9)  of  which 
each  contains  one  of  the  possible  expression  for  fy(y).  In  this  case  particular 
care  is  necessary  in  the  determination  of  the  limits  of  integration.  They  are 
given  by  the  limits  of  the  intervals  in  which  u  and  fy(y)  are  defined. 

(b)  Prove  the  relation  (5.11.15)  by  performing  the  integration  (5.11.5)  for  the  case 
that  fx  and  fy  are  normal  distributions  with  means  0  and  standard  deviations 
crx  and  Gy. 


Problem  6.1:  Efficiency  of  Estimators 

Let  Xi ,  X2,  X3  be  the  elements  of  a  sample  from  a  continuous  population  with  unknown 
mean  %  but  known  variance  a2. 

(a)  Show  that  the  following  quantities  are  unbiased  estimators  of  x, 

51  =  ^Xi  +  |x2  +  ^X3  , 

52  =  5X1  +  5X2  +  5X3  , 

53  =  gXi  +  5X2  +  ^X3 

Hint:  It  is  simple  to  show  that  S  =  YH=i  aixt  unbiased  if  Y^=  1  ai  —  1  holds. 
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(b)  Determine  the  variances  <r2(Si),  a2(S2),  <j2( S3)  using  (3.8.7)  and  the  as¬ 
sumption  that  the  elements  Xi,  X2,  X3  are  independent. 

(c)  Show  that  the  arithmetic  mean  x  =  |xi  +  5X2  +  5X3  has  the  smallest  vari¬ 
ance  of  all  estimators  of  the  type  S  =  Y^=iaixi  fulfilling  the  requirement 

Ei=l  °i  =  L 

Hint:  Minimize  the  variance  of  S  =  a\X\  +  <22*2  +  (1  —  o\  —  with  respect 
to  ci\  and  a 2.  Compute  this  variance  and  compare  it  with  the  variances  which 
you  found  (b). 


Problem  6.2:  Sample  Mean  and  Sample  Variance 

Compute  the  sample  mean  X,  the  sample  variance  S2,  and  an  estimate  for  the  variance 
of  the  sample  mean  S?  =  (\/n)S2  for  the  following  sample: 

18,  21,  23,  19,  20,  21,  20,  19,  20,  17. 

Use  the  method  of  Example  6.1. 

Problem  6.3:  Samples  from  a  Partitioned  Population 

An  opinion  poll  is  performed  on  an  upcoming  election.  In  our  (artificially  con¬ 
structed)  example  the  population  is  partitioned  into  three  subpopulations,  and  from 
each  subpopulation  a  preliminary  sample  of  size  10  is  drawn.  Each  element  of  the 
sample  can  have  the  values  0  (vote  for  party  A)  or  1  (vote  for  party  B).  The  samples 
are 

i  =  1  (pi  =0.1)  :  Xij  =  0,0, 0,0, 1,0, 1,0, 0,0, 

i  =  2  (pi  =  0.7)  :  =  0, 0, 1, 1,0, 1,0, 1, 1,0, 

i  =  3  (pi  =  0.2)  :  Xij  =  0, 1, 1, 1, 1,0, 1, 1, 1,0, 

(a)  Use  these  samples  to  form  an  estimator  for  the  result  of  the  election  and  its 

variance  S?.  Does  this  result  show  a  clear  advantage  for  one  party? 

(b)  Use  these  samples  to  determine  for  a  much  larger  sample  of  size  n  the  sizes 
nl  of  the  subsamples  in  such  a  way  that  x  has  the  smallest  variance  (cf.  Exam¬ 
ple  6.6). 


Problem  6.4:  x  2 -distribution 

(a)  Determine  the  skewness  y  =  /X3 / cr3  of  the  x  2 -distribution  using  (5.5.7).  Begin 
by  expressing  /Z3  by  A3,  x,  and  £(x2). 

(b)  Show  that  y  — >  0  for  n  ->  00. 
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Problem  6.5:  Histogram 

Construct  a  histogram  from  the  following  measured  values: 


26.02,  27.13,  24.78,  26.19,  22.57,  25.16,  24.39,  22.73,  25.35,  26.13, 

23.15,  26.29,  24.89,  23.76,  26.04,  22.03,  27.48,  25.42,  24.27,  26.58, 

27.17,  24.22,  21.86,  26.53,  27.09,  24.73,  26.12,  28.04,  22.78,  25.36. 


Use  a  bin  size  of  Ax  =  1. 

Hint:  For  each  value  draw  a  cross  of  width  Ax  and  height  1.  In  this  way  you  do  not 
need  to  order  the  measured  values,  since  each  cross  is  drawn  within  a  bin  on  top  of  a 
preceding  cross.  In  this  way  the  bars  of  the  histogram  grow  while  drawing. 


Problem  7.1:  Maximum-Likelihood  Estimates 


(a)  Suppose  it  is  known  that  a  random  variable  x  follows  the  uniform  distribution 
f(x )  =  l/b  for  0  <  x  <  b.  The  parameter  b  is  to  be  estimated  from  a  sample. 
Show  that  S  =  b  =  Xmax  is  the  maximum-likelihood  estimator  of  b.  (Hint:  This 
result  cannot  be  obtained  by  differentiation,  but  from  a  simple  consideration 
about  the  likelihood  function). 

(b)  Write  down  the  likelihood  equations  for  the  two  parameters  a  and  r  of  the 
Lorentz  distribution  (see  Example  3.5).  Show  that  these  do  not  necessarily 
have  unique  solutions.  You  can,  however,  easily  convince  yourself  that  for 
|x(i)  —  a  \  r  the  arithmetic  mean  x  is  an  estimator  of  a. 


Problem  7.2:  Information 


(a)  Determine  the  information  I{X)  of  a  sample  of  size  N  that  was  obtained  from 
a  normal  distribution  of  known  variance  a2  but  unknown  mean  X  =  a. 


(b)  Determine  the  information  I(X)  of  a  sample  of  size  N  which  was  drawn  from 
a  normal  distribution  of  known  mean  a  but  unknown  variance  X  =  cr2.  Show 
that  the  maximum-likelihood  estimator  of  a2  is  given  by 


S 


1 

N 


y^(x(7)  —  a)2 

7  =  1 


and  that  the  estimator  is  unbiased,  i.e.,  Z?(S)  =  E{ S)  —  X  =  0. 


Problem  7.3:  Variance  of  an  Estimator 

(a)  Use  the  information  inequality  to  obtain  a  lower  limit  on  the  variance  of  the 
estimator  of  the  mean  in  Problem  7.2  (a).  Show  that  this  limit  is  equal  to  the 
minimum  variance  that  was  determined  in  Problem  6.1  (c). 

(b)  Use  the  information  inequality  to  obtain  a  lower  limit  on  the  variance  of  S  in 
Problem  7.2  (b). 

(c)  Show  using  Eq.  (7.3.12)  that  S  is  a  minimum  variance  estimator  with  the  same 
variance  the  lower  limit  found  in  (b). 
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Problem  8.1:  F- Test 
Two  samples  are  given: 

(1)  21,  19,  14,  27,  25,  23,  22,  18,  21,  (Nx  =  9) , 

(2)  16,  24,  22,  21,  25,  21,  18,  ( N2  =  7) . 

Does  sample  (2)  have  a  smaller  variance  than  sample  (1)  at  a  significance  level  of 

a  =  5%? 


Problem  8.2:  Student’s  Test 

Test  the  hypothesis  that  the  30  measurements  of  Problem  6.5  were  drawn  from  a 
population  with  mean  25.5.  Use  a  level  of  significance  of  a  =  10%.  Assume  that  the 
population  is  normally  distributed. 

Problem  8.3:  x  2 -Test  for  Variance 

Use  the  likelihood-ratio  method  to  construct  a  test  of  the  hypothesis  Ho(cr2  =  crfi) 
that  a  sample  stems  from  a  normal  distribution  with  unknown  mean  a  and  unknown 

variance  <Tq.  The  parameters  are  \  =  (a,  a).  In  co  one  has  =  (x,  oq),  in  Q\ 

l(i2)  =  (x,  s). 

(a)  Form  the  likelihood  ratio  T . 

(b)  Show  that  instead  of  T ,  the  test  statistic  T'  —  Nsf2 /ctq  can  be  used  as  well. 

(c)  Show  that  T'  follows  a  x  ^distribution  with  N  —  1  degrees  of  freedom  so  that 
the  test  can  be  performed  with  the  help  of  Table  1.7. 


Problem  8.4:  x2-Test  of  Goodness-of-Fit 

(a)  Determine  the  mean  a  and  the  variance  o2  for  the  histogram  of  Fig.  6.  lb,  i.e., 

xk  193  195  197  199  201  203  205  207  209  211 

~Vk  l  2  9  12  23  25  IT  9  6  2~ 

Use  the  result  of  Example  7.8  to  construct  the  estimates.  Give  the  estimators 
explicitly  as  functions  of  nk  and  xk. 

(b)  Perform  (at  a  significance  level  of  a  =  10%)  a  x2_test  on  the  goodness-of-fit 
of  a  normal  distribution  with  mean  a  and  variance  a2  to  the  histogram.  Use 
only  those  bins  of  the  histogram  for  which  npk>  4.  Determine  pk  from  the 
difference  of  two  entries  in  Table  1.2.  Give  a  formula  for  pk  as  function  of 
xk.  Ax,  a,  a2,  and  i/'o-  Construct  a  table  for  the  computation  of  x2  containing 
columns  for  xk,  nk,  npk,  and  (nk  -  npk)2/npk. 
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Problem  8.5:  Contingency  Table 

(a)  In  an  immunology  experiment  [taken  from  SOKAL  and  Rohlf,  Biometry 
(Freeman,  San  Francisco,  1969)]  the  effect  of  an  antiserum  on  a  particular 
type  of  bacteria  is  studied.  57  mice  received  a  certain  dose  of  bacteria  and 
antiserum,  whereas  54  mice  received  bacteria  only.  After  some  time  the  mice 
of  both  groups  were  counted  and  the  following  contingency  table  constructed. 

Dead  Alive  Sum 

Bacteria  and 

antiserum  13  44  57 

Only  bacteria  25  29  54 

Sum  38  73  Tbtal  111 

Test  (at  a  significance  level  of  a  =  10%)  the  hypothesis  that  the  antiserum  has 
no  influence  on  the  survival  probability. 

(b)  In  the  computation  of  x  2  in  (a)  you  will  have  noticed  that  the  numerators  in 
(8.8.1)  all  have  the  same  value.  Show  that  generally  for  2  x  2  contingency 
tables  the  following  relation  holds, 

~~  1 

tiij-npiqj  =  -(nnn22-nnn2i)  ■ 

n 
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Problem  2.1 

(a)  IV  =  365". 

(b)  N'  =  365  •  364  — 

(c)  Pm  =  N'/N  = 

(d)  P  =  1-Pdiff. 


(365  —  «  +  1) . 

364  365  -  ft  +  1 

365  .  365 


«  —  1 
365 


Putting  in  numbers  one  obtains  P  ~  0.5  for  n  =  23  and  P  ~  0.99  for  n  =  57. 

Problem  2.2 


P(A  +  B)  =  P(A)  +  P(B)-P(AB)  , 

P  (ace  or  diamond)  =  P  (ace)  +  P  (diamond)  —  P  (ace  +  diamond) 

4  13  1  4 
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Problem  2.3 

(a)  P(A)  =  A  ,  P(B )  =  H  ,  P(AB)  =  A,  i.e.,  P(AB)  =  P(A)P(B) . 

(b)  P(A)  =  A  ,  P(B )  =  H  ,  P(AB )  =  1  i.e.,  P(AP)  /  P(A)P(P) . 

Problem  2.4 


P(AB )  =  1  -  P(A  +  5)  =  1  -  P(A )  -  P(P)  +  P(AP)  . 

For  A  and  B  independent  one  has  P(AB)  =  P(A)P(B).  Therefore, 

P(AB)  =  1-P(A)-P(P)  +  P(A)P(5)  =  (1-P(A))(1-P(5)) 

=  P(A)P(B )  . 


Problem  2.5 

(a)  P(A)  =  0.2,  P(P)  =  0.3 ,  P(C)  =  0.5 , 

P(AB )  =  2 •  0.2 •  0.3 ,  P(AC)  =  2 •  0.2 •  0.5 , 

P(PC)  =  2  •  0.3  •  0.5 ,  P(AA)  =  0.22 ,  P(BB)  =  0.32 , 
P(CC)=  0.52. 


(b)  P(A)  =  2/10  =  0.2,  P(5)  =  3/10  =  0.3,  P(C)  =  5/10  =  0.5, 


P(AP)  = 
P(BC )  = 
P(AA)  = 


2  3  3  2 

- + - ,  P(AC) 

10  9  10  9 

3  5  5  3 

10  ’9  +  To  ' 9’ 


2  5  5  2 

To  9+ lo  ' 9 


2  1  3  2  5  4 

- ,  P(BB )  = - ,  P(CC)  = - 

10  9  10  9  10  9 


For  (a)  and  (b)  it  holds  that 


P{2  identical  coins)  =  P(AA)  +  P(BB)  +  P(CC) , 
P( 2  different  coins)  =  P(AB )  +  P(AC )  +  P(BC ) 

=  1  —  P(2  identical  coins) . 
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Problem  3.1 


(a) 


x  = 


6  .  6 


i=i 

6 


21 

,  i  =  —  =  3.5  , 

6  z— '  6 

1  =  1 


a2(x)  =  (x,  ~x)2p, 


1  =  1 

1 


-(2.52  +  1.52  +  0.52  +  0.52  +  1.52  +  2.52) 
6 


^3  = 


-  (6.25  +  2.25  +  0.25)  =  2.92 
6 

6  i  6 

y^(x;  -x)3pi  =  -  -  3.5) 

1=1  1=1 
.3 


=  0  , 


y  =  M3/a  =0  . 


(b) 


1 


x  = 


(2 +  2  +  3  +  8+ 15 +  18)  =  4 


cr2(x)  = 


12 

1  7JQ 

—  (2  •  32  +  1  •  22  +  1  •  l2  +  3  •  l2  +  3  •  22)  =  —  =  3. 167 
12  12 


1 


+3 

Y 


12 


(—2  -  33  —  1  •  23  —  1  - 13  +  3  - 13  +  3  -  23)  =  —3 


=  /x3/o-3  = -3/3. 1673/2  = -0.533 


Problem  3.2 

For  a  =  —  b:  x"  =  0  , 


•*0.5  =  0 


0-2(x)  = 


b‘ 


Fora=— 2b:  x=— — =— 0.33 b  ,  xo.5=&(x/3  —  2)=— 0.21b 

Problem  3.3 


cr2(x)= — l?2  . 
18 


g(y)=exp  - 


y 


x  _  dy(x)  1  /  x 


y(x)  =  -  ;  / (x)= 

er  dx 


S(y(x))=-expl  - 


Problem  3.4 


(a)  fx(x)  = 


1 


1  X 


VZ7T 


exp 


o 


X 


2al 


’  Jy 


fy(y) 


I 


V27T 


exp 


iy 


(7 


y 


2  °y 


(b)  Yes,  since  (3.4.6)  is  fulfilled. 

(c)  The  transformation  can  be  written  in  the  form 


=  R 


R  = 


cos  (p  sirup 
—  sin  </>  cos  (p 
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It  is  orthogonal,  since  RTR  =  /.  Therefore, 


g(u,v)  =  f(x,y)  = 


1 


2naxay 


exp 


2  2 
x  y 


la}  2a] 


1  /  (ucos<p  —  vsiiuf))2  {u  sin</>  +  v  cos  cp)2 

exp 


2naxay 

1 

2lT(JX<Jy 


exp 


2ax2  2cx2 

u2  cos2  (/>  +  v2  sin2  (p  —  2uv  cos  (p  sin  (p 


2  a2 


X 


u 2  sin2  p  +  vz  cosz  p  +  2uv  cos  </>  sin  </> 

2a  2 

y 


2  2 


(d)  For  (j)  =  90°  :  cos</>  =  0,  sin</>  =  1  etc.  the  expression  g(u,  v)  factorizes,  i.e., 
g(u,v)  =  gu(u)gv(v). 


(e) 


For  ax  =  ay  =  a  : 


g(u,v)  = 


1 


J  = 


2na2 

dx 

3  y 

dr 

dr 

dx 

3  y 

3  (p 

dip 

exp 


u 2  +  v2 
2o2 


=  g(u)g(v)  . 


cos  (p  simp 
—rsinp  rcosp 


g(r,  (p)  =  rf(x,y)  = 


2tt<72 

ji 


exp 


x2  +  y2 
2o2 


2na2 

2x 


exp 


2er2 


g,(r)  = 


g<t>  (<P)  = 


l 


2n 


An 


g(r,p)  dp  =  —  exp  - 


a 


2a2 


i  r°° 

'cr2  Jo 

i  r°° 

'V2  Jo 

r-  [exP 

In  L 


2a2 
u 


dr 


2a2 
u  \  I00 

27r  L  V  2a2) -o 
g(r,<p)  =  gr(r)g$(p)  , 


d  u 


1 


2n 


=  r 


therefore,  r  and  <p  are  independent. 
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Problem  3.5 


8  = 

dg  _ 
3  £ 

(Ag)2  = 

Ag  = 

Problem  3.6 

(a)  y  =  Tx 


y  t  j  99.8 

47T  — r=47r‘ - rcms 


4jr2 

Y2 


T2 
■  =  9.58  s 


2.032 


2  =  956. 09  cm  s  2 


dg 


8jt2£ 


rv  \  2  /  rv  \  2 

8 ^  (At)2  +  8 


3 1, 

47.19cms-2 


3  T 


3  T  T 3 

AT2  =  2226cm2 s  4 


=  —942  cm  s 


T  = 


Cv  = 


/  3yi  3yi  \ 

/dp  dp  \ 

dxi  dX2 

dm  dv 

dy2  dy2 

3  E  3  E 

\  dxi  dX2  / 

V  3m  dv  ) 

TCxTt  = 

i; 

1 

2  17 


m 
m  v 


v 

-  v2 
2 


m 
m  v 

2  ,  u2 


2  2 
am 


(^a2  +  b2)m2  v3  (\a2  +  bz)mz  v 


]2v2 

,2,  ,3 


(a  +  b  )m  v  (^a  +  b  )m  v 


0 

0  b2  v2 

1  „2  i  u2 
2' 


1  2 
m  mt 


2  3 

V 

2. ,4 


co  w(p,E)  (\a2  +  bz)mzv 


2„,3 


p(p,E) 


o 


(p)°(E)  yj  (a2  +  b2)m2v2J \\a2  +  b2)m2v4 

\a2  +  b2 


y(a2  +  £>2)(|a2  +  b2) 

For  a  =  0  or  b  =  0:  />  =  1 .  (In  this  case  either  m  or  v  are  completely  deter¬ 
mined.  Therefore,  there  is  a  strict  relation  between  E  and  p.)  If,  however,  a, 
b  7^  0,  one  has  p  ^  1,  e.g.,  for  a  =  b  one  obtains  p  =  3/vl(). 

(b)  m  =  ^p2/E  ,  v  =  E/2p  , 

(i a2  +  b2)p 2  (a2  +  2b2)Ep 


Cy  = 
m  = 
T  = 


(a2  +  2b2)  Ep  (a2  +  Ab2)E: 

Ty  , 

3m  3m  \  /  3m  3m 


P‘ 


3yi  3^2 


3p  ’  3  E 


P 


E ’  2£2 


Cm  =  o2(m)  =  TCyT 


P_ 

E 


z_\ 

2 E2  ) 


(, a2  +  b2)p 2  ( a2  +  2b2)Ep 
(a2  +  2b2)Ep  (a2  +  4b2)E2 


1  \ 
ejL 

V  2E2  ) 


A 


2  P  2  2 

a  — -  =  a  m 
4  E2 
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Problem  5.1 


(a)  W£+1  = 


n  y+v-*-'  = - - - 

k  +  \JF  H  (k  +  l)\(n  - k  -  1)\ 

- pkq  k—  =  W,n - -  . 

k\(n—k)\k  + 1  q  kk  +  lq 


k  n—k  P 

p  q  — 

q 


(b)  W?  = 


0.83  •  0.22  =  10  •  0.512  •  0.04  =  0.2048 


Wi  =  w?  ■  -  ■ 


A  = 


2  0.8 


=  Wi  ■  -  ■ 


4  0.2 
1  0.8 


5  0.2 


=  0.2048  •  2  =  0.4096 


=  0.4096-0.8  =  0.32768 


1  -  W|  -  iy|  =  0.26272 


P9  =  Pi  -  w?  =  0.05792 . 


(c)  Using  the  result  of  (a)  we  obtain 


K  -  K-x  =  K-x 


n—k+lp 
k  q 


Thus  the  probability  increases  as  long  as  the  expression  in  brackets  is 
positive,  i.e., 

(n-k  +  l)p 

- 1  >  0  . 

kq 

Since  k  and  q  are  positive  we  have 


(n  —  k  +  \)p  >  kq  —  k{\  —  p)  ,  k  <  (n  +  \)p 


The  most  probable  value  km  is  the  largest  value  of  k  for  which  this  inequality 
holds. 


(d)  (pXi  (t)  =  E{eltXi}  =  qelt'°  +  ptlt  —  q  +  pelt ; 


fik)  =  w" 


Problem  5.2 

A0 

For  A  =  3  one  has  f(0)  =  — e_A  =  e_1  =  0.0498  ~  0.05  —  5%. 

J  0! 
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Problem  5.3 


(a)  The  fraction 


fr  =  2lAo 


(Ro  —  ciiRu)  —  Ro 


bR 


o 


) = 2*>  (-|) 


is  rejected,  since  R  <  Rq  —  A2  or  R  >  Rq  +  A 2.  Correspondingly  the  fractions 


h  =  2rh  {-j)~  f> 


and 


/5  =  W2-/r 


give  prices  2C  and  5C,  respectively.  Therefore, 


P  = 


2  f2C  +  5f5C-C  =  C{2f2  +  5-  5/2  -  5fr  -  1} 
C{4  —  3/2  —  5fr} 

Cj4-6/o(— y)  +  3/r-5/rj 


(b)  /o(-0.2)  =0.421  ;  /0(-l)  =  0.159  , 


i.e.,  P  =  C{4- 2.526 -0.636}  =0.838C  . 


(c) 


d2/ 


1 


exp(— (x  —  a)z /2bA){{x  —  a)2/b2  —  1}  =  0 


dx2  sflnb^ 

is  fulfilled  if  the  expression  in  the  last  set  of  brackets  vanishes. 


Problem  5.4 


(a)  P(R )  = 


L 


R 


R 


1  r 

g(r)dr  I  rt~r  / 2o  dr 

o  (jI  Jo 

i  r  R2 

/  e-M/2or2  du 

?2  Jo 


2(7  2 


=  |^e“M/2or2j  2  =  \-t-Rl/2(j2  . 

(b)  1  —  P  =  exp(— R2/2cr2)  ,  R2 /2a2  =  —  ln(l  —  P)  . 

For  a  —  1,  P  —  0.9  one  obtains 


R  =  7 — 21n0. 1  =  \4k61  =  2.15  . 


Problem  5.5 

(a)  0  <  u  <  1  : 


f(u)=  f  fdy)dy 
Ju- 1 


1  <  u  <  2  : 
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/(“)  = 


2  <  u  <  3  : 


nil 

Ju- 1 


/,  ( v)  d  y  +  /  f2(y)dy 


nu 

J  u  —  1 

f  ydy+  f 

Ju- 1  Jl 


nu 

J  u  —  1 


w 


(2  -  y)  dy 


|(1  -  (M  -  l)2)  +  |(1  -  (2  -  M)2) 

1  7 

-(— 3  +  6w  — 2w2) 


/(«)=/  ,/2(>')dy  = 


f 

Ju-l 


[ 

J3—u 


l 


(2-y)dy  =  -j  zdz  =  -(3-w) 


(b)  /(«)  = 


kl 


oo 


exp 


x2  (u  —  x)d 


2naxay  J_00  A  ^  2a2  2a2 

Completing  the  square  yields  for  the  exponent 


cbc 


<7 


2cTxffy 


X - |  - 


+^t 

O'4  Gz 


With  the  change  of  variables  v  =  (cr/axcry)(x  —  g2u/g2)  we  obtain 


f(u)  = 


1 


2naxGy 


exp 


4  2  2 

ax~aax  2 

2a2  O2  O2 

a  y 


OxOy 

a 


/oo 

exp 

-oo 


1 


-L, 


w 


dv 


\2jt 


exp 


a 


2g2 


since  g £  =  gx{gl  —  g2)  . 


2/_2 


y 


Problem  6.1 

(a)  E(S)  =  EfcoiXi}  =  EaiE(Xt)  =JcJ2di  =  x 
if  and  only  if 


I Zai  =  1  • 


(b)  a 


E(^V-=-E«; 


ct2(Si)  = 


o-2(S2)  = 


o-2(S3)  = 


o 


V  3  Xi  J 

2(2  +  _L  +  I^  =a2.-  =  0.375a2  , 

V 16  16  4/  8 


1 


G 


25  +  25  +  25 
1  1  1 


=  a 


25 


=  0.360cr  2 


a2l-  +  -  +  -)=a2 
36  9  4 


7 

18 


=  0.389cr 
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(c)  cr2(S) 
da2(S) 

da\ 

3<t2(S) 

da2 

a\ 


a2 

a2(S) 

Problem  6.2 
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[a2  +  aj  +  (1  -  (ai  +  a2))2]er2  , 

[2a! -2(1  -  (ai +a2))]cr2  =  0  , 

[2a2  -2(1  -  (ai  +a2))]a2  =  0  , 

l-(ai+a2)  , 

l-(ai+a2)  ,  a\=a2  =  \  ; 

|ct2  =  0.333ct2  . 


a 

A 


X 

s 


1 

—  (-2+1  +  3-  1  +  0+  1  +  0-  1+0-3)  =  -0.2 
19.8  , 

l 

-(1.82  +  1.22  +  3.22  +  0.82  +  0.22  + 

+  1 ,22  +  0.22  +  0.82  +  0.22  +  2.82) 

1 

-•25.60  =  2.84  , 

9 

0.284  .  Therefore  x  =  19.8  +  0.53 


Problem  6.3 


(a)  X!  =  0.2,  x2  =  0.5,  x3  =  0.7 


0.02  +  0.35  +  0.14  =  0.51 


1  2 

-(2-  10  -  0.22)  =  0.178  , 

1  o 

-(5-  10-0.52)  =  0.278  , 

1  9 

-(7-  10  -  0.72)  =  0.233  , 

0.12  0.72  0.22 

0. 178  + - 0.278  +  —0.233 


10 


10 


10 


0.001  •  0. 178  +  0.049  •  0.278  +  0.004  •  0.233 


0.0147 
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sii  =  7s?  =  0.12  . 

The  result  x  —  0.51  ±0.12  does  not  favor  any  one  party  significantly. 


(b)  Sl 


0.422,  S2  —  0.527,  S3  =  0.483 


Pi  Si  =  0.0422,  P2S2  =  0.369,  P3S3  =  0.0966 


I]  Pi S,  =  0.508  , 

n\  n 3 

—  =  0.083,  —  =  0.726,  —  =  0.190 

n  n  n 

Problem  6.4 

(a)  For  simplicity  we  write  x2  =  u. 


F3 


£(U3) 


3uE  (u2) 
2 u3 

F3 

Y 


E{(u  -n)3}  =  E{u3  -  3u2u  +  3uu2  -u3} 
E( u3)  -  3uE(u2)  +  3u2 E (u)  -u3 
7.3  —  3m£'(u2)  4“  2u3  , 

a3 = 4^(0) 

1 

i<p'"(0)  =  i(-X)(-X  -  1)(— A.  -  2)(— 2i)3 
8A.  (A.  +  1)(7.  +  2)  =  87. 3  +  247. 2  +  167.  , 

67.  (4X2  +  4X)  =  24T.3  +  24T2  , 

2  •  (2T.)3  =  167. 3  , 

167.  , 

/z3/cr3  =  167./874  =  27._2 


(b)  Obviously  y  =  2/VX  — >  0  holds  for  X  —  \n 


00. 


Problem  6.5 

See  Fig.  G.2. 


Problem  7.1 

A 1  1 

(a)  L  =  n-  =  ^ 

1  1  ]yN 

j= 1 

(b)  T-i  =a,X2  =  r. 


obviously  one  has  L  =  max  for  &  =  xmax 


L 


2  A2 

jtX 2  4(xO')  —  7.1)2  +  7-2 


1 

4(xO)  —  7.1)2  +  T.2 
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N(x) 

A 


10 

8 

6 

4 

2 

0 

20  22  24  26  28  30 


x 


X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

lX, 

lXj 

lX, 

x< 

o  X 


Fig.G.2:  Histogram  of  the  data  of  Problem  6.5. 


I 

di 

3 1 
3A2 


N 

N  (In  2  -  In  it  +  In  X2)  -  ^  ln[4(x°-)  -  k  i  )2  +  X22] 

7  =  1 


8(X(i)-A.i) 

4(x(J)  -  X\)2  +  Xj 

N  1 

’2A2^4(x<  j)-xx)i  +  x\ 

7  =  1  z 


There  is  not  necessarily  a  unique  solution  since  the  equations  are  not  linear  in 
and  A2.  For  |x(i)  —  |  «12,  however,  we  may  write 


Ai)  =  0 


N 

AfAi  =  ^x(i)  ,  a  i  =  a  =  x 

7  =  1 


Problem  7.2 


FT  '  exp[-(x0)  -  X)2 /2a2] 

,  V  27 T  G 
7  =  1 


N 


(v/2^ro-)“A']qexp[-(x0')-A)2/2CT2]  , 


(a) 


L 
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i  = 


l'  = 


l" 

/(A) 


1  N 

—N\n{\l2n a )  -  —  ^(x^  -  A) 

°  7  =  1 

1  N 

a1  L — ' 

7  =  1 

-^2  , 

—E{£")  =  N/a2  . 


(b) 


L  = 


(X(2)  -  a)2 


2A 


FI  ^exp 

4_j  V2ttVX 


(27r)_A,/2A_A,/2pjexp 

7  =  1  ' 


(X(j)  —  fl)2 

2A 


£ 


£' 


N  N  1  A  , 

=  -yln(27r)-ylnA-  — ^(x0)-a)2  , 

A  7=1 

AI  1  A  m  , 

= - 1 - r  >  (x(;)  -  a)2  , 

2A  2A2 ^ 


7  =  1 


l"  = 


Ai 


2A2  A3 


tE(x0)-«)2 


7  =  1 


/(A)  =  —E(£")  =  — 


77 

2A2 


+  F£(l>0>-a>2j 


77  1 

+  — NA 


/(A)  = 


£(S)  = 


2A2  A3 

1  77 

(-N  +  2N)  = 


2A2 


2A2 


iJf>W-a)2J=<x2  =  A  • 


Problem  7.3 


(a)  o-2(S)  > 


1 


a 


/(A)  N 

In  Problem  6.1  (c)  we  had  cr2(x)  =  l/3cr2  for  N  =  3. 


(b)  cr2(S)  > 


1 


2A- 


7(A)  77 


(c)  From  Problem  1.2  we  take 


N  l  N  77 

+  —  NS  =  —  (S-A)  =  —  (S  — JSCS))  . 


2A  2A2 


2A2 


2A2 
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7  ~  2A2 

Thus,  cr2(S)  = - . 

N 

Problem  8.1 

Xi  =  21.11  ,  x2  =  21.00  , 

S2  =  14.86  ,  =  10.00  , 

T  =  S2/S2  =  1.486  ,  F0.95(8,6)  =4.15 

The  variance  of  sample  (2)  is  not  significantly  smaller. 

Problem  8.2 


x  = 
T  = 
T I  < 


25.142  ,  S2  =  82.69/29  =  2.85 

X  —  25.5  25.142-25.5  0.358 


S=  1.69 


S/V30  1.69/5.48 

*o.95(/  =  29)  =  1.70  . 


0.309 


=  -1.16  , 


Therefore  the  hypothesis  cannot  be  rejected. 


Problem  8.3 

(a) 


f  (x(1),  x(2), . . . ,  x(N)XQ))  =  n  '  exp 

1  7  V27T S' 

J  =  1 


(XU)  -  X)2 

2s * 


=  (VTjts')  ^exp  (  — 


/(x(1\x(2),...,xw,XM) 


=  (V2jrc7o)  A  exp  (  — 


T  = 


In  T  = 


0o 


N 


exp 


\N  s'2 


1  1 


cr, 


0 


1 


N(lna0-lns')  +  ±Ns 


‘/2 


1  1 


O', 


=  N(lncx0-lns')  +  ^N 


j2 


0 


--N 

2iy 


t/2 


2  "  ’ 

°0 


(b)  T '  —  Ns'2/<Jq  is  a  monotonically  increasing  function  of  s'  if  s'  >  cr0,  otherwise 
it  is  monotonically  decreasing.  Therefore  the  test  takes  on  the  following  form: 
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T'  >  T[  ,  , 

1-2“ 

T  <  T[  for 

Hq(s'  =  (T0)  , 

T'  >  T[_a  for 

Hq(s'  <  CTq)  , 

T'<K  for 

Hq(s'  >  (Jq) 

(c)  The  answer  follows  from  (6.7.2)  and  S/2  =  (N  —  l)S2/N. 


Problem  8.4 

l 

(a)  a  —  —  Y  rik*k  —  202.36  . 

n  k 

a2  —  - y^ n/v(X/,  —  cl)2  =  13.40,  a  =  3.66. 

n  —  1  L — ' 

k 


(b)  Pk(Xk) 


Xk  +  l  Ax  —  a 


a 


=  fo+  -  fo-  ■ 


x* 

nk 

^0+ 

fo- 

npk 

(nk  -  npk)2 

npk 

193 

i 

0.011 

0.002 

(0.9) 

— 

195 

2 

0.041 

0.011 

(3.0) 

— 

197 

9 

0.117 

0.041 

7.6 

0.271 

199 

12 

0.260 

0.117 

14.3 

0.362 

201 

23 

0.461 

0.260 

20.1 

0.411 

203 

25 

0.673 

0.463 

21.2 

0.679 

205 

11 

0.840 

0.673 

16.7 

1.948 

207 

9 

0.938 

0.840 

9.8 

0.071 

209 

6 

0.982 

0.938 

4.3 

0.647 

211 

2 

0.996 

0.982 

(1.4) 

— 

X2  =4.389 


The  number  of  degrees  of  freedom  is  7  —  2  =  5.  Since  Xo  90 (5)  =  9.24,  the  test 
does  not  reject  the  hypothesis. 


Problem  8.5 
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25- 
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54-38 

111 


29- 


54-38 


+ 


54-73 

111 


54-73 


111  111 
42.43  42.43  42.43  42.43 


+ 


+ 


+ 


19.51  37.49  18.49  35.51 


=  6.78 


Since  X090  =  2.71  for  /  =  1,  the  hypothesis  of  independence  is  rejected. 


1 

-(nn+ni2)(n1j+n2j) 


1 

=  -[riij(n ii  +«i2  +  «2i  +«22>  -  («n  +  ni2)(nlj  +  n2j)]  . 

One  can  easily  show  that  the  expression  in  square  brackets  takes  the  form 
(nnn22 -n\2n2\)  for  all  i,  j. 


G.3  Programming  Problems 

Programming  Problem  4.1:  Program  to  Generate 
Breit-Wigner-Distributed  Random  Numbers 
Write  a  method  with  the  following  declaration 

double  []  breitWignerNumbers(double  a,  double  gamma,  int  n). 

It  is  to  yield  n  random  numbers,  which  follow  a  Breit-Wigner  distribution  having  a 
mean  of  a  and  a  FWHM  of  r.  Make  the  method  part  of  a  class  which  allows  for 
interactive  input  of  n,a,T  and  numerical  as  well  as  graphical  output  of  the  random 
numbers  in  the  form  a  histogram,  Fig.  G.3.  (Example  solution:  SIRandom) 

Programming  Problem  4.2:  Program  to  Generate  Random  Numbers 
from  a  Triangular  Distribution 
Write  a  method  with  the  declaration 

double  []  triangularNumbersTrans  (double  a,  double  b,  double  c, 
int  n). 

It  is  to  yield  n  random  numbers,  following  a  triangular  distribution  with  the  parame¬ 
ters  a,  b,  c  generated  by  the  transformation  procedure  of  Example  4.3. 

Write  a  second  method  with  the  declaration 

double  []  triangularNumbersRej  (double  a,  double  b,  double  c, 
int  n), 

which  solves  the  same  problem,  but  uses  von  Neumann’s  acceptance-rejection 
method.  Which  of  the  two  programs  is  faster? 

Write  a  class  which  allows  to  interactively  choose  either  method.  It  should  also 
allow  for  numerical  and  graphical  output  (as  histogram)  of  the  generated  numbers, 
Fig.  G.4.  (Example  solution:  S2Random) 
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>  x 


Fig.  G.3:  Histogram  of  1000  random  numbers  following  a  Breit-Wigner  distribution  with 
a  =  10  and  r  =  3. 


Programming  Problem  4.3:  Program  to  Generate  Data  Points  with  Errors 
of  Different  Size 

Write  a  method  similar  to  DatanRandom.line  which  generates  data  points  y  = 
at  +  b  +  Ay.  The  errors  Ay,  however,  are  not  to  be  taken  for  all  data  points  from 
the  same  uniform  distribution  with  the  width  a,  but  Ayt  is  to  be  sampled  from  a 
normal  distribution  with  the  width  ot .  The  widths  G\  are  to  be  taken  from  a  uniform 
distribution  within  the  region  amin  <  <  amax. 

Write  a  class  which  calls  this  method  and  which  displays  graphically  the  straight 
line  y  =  at  +  b  as  well  as  the  simulated  data  points  with  error  bars  yt  d=  Ayt ,  Fig.  G.5. 
(Example  solution:  S3Random) 

Programming  Problem  5.1:  Convolution  of  Uniform  Distributions 

Because  of  the  Central  Limit  Theorem  the  quantity  x  =  Y^!i= i  x*  follows  in  the  limit 
N  ->  oo  the  standard  normal  distribution  if  the  xz  come  from  an  arbitrary  distribu¬ 
tion  with  mean  zero  and  standard  deviation  1  /\fN .  Choose  for  the  X;  the  uniform 
distribution  with  the  limits 

a  —  —^/3/N  ,  b  =  —a 

Perform  a  large  number  nQxp  of  Monte  Carlo  experiments,  each  giving  a  random  value 
x.  Produce  a  histogram  of  the  quantity  x  and  show  in  addition  the  distribution  of  x 
as  a  continuous  curve  which  you  would  expect  from  the  standard  normal  distribution 
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Fig.G.4:  Histogram  of  1000  random  numbers  following  a  triangular  distribution  with  a  =  1 , 
b  =  4,  c  =  3. 


(Fig.  G.6).  (Use  the  class  GraphicsWithHistogramAndPolyline  for  the  si¬ 
multaneous  representation  of  histograms  and  curves.)  Allow  for  interactive  input  of 
the  quantities  nQX p  and  N.  (Example  solution:  SIDistrib) 

Programming  Problem  5.2:  Convolution  of  Uniform  Distribution 
and  Normal  Distribution 

If  X  is  taken  from  a  uniform  distribution  between  a  and  b  and  if  y  is  taken  from  a 
normal  distribution  with  mean  zero  and  width  a,  then  the  quantity  u  =  x  +  y  follows 
the  distribution  (5.11.14).  Perform  a  large  number  nQX p  of  Monte  Carlo  experiments, 
each  resulting  in  a  random  number  u.  Display  a  histogram  of  the  quantity  u  and  show 
in  addition  a  curve  of  the  distribution  you  would  expect  from  (5.11.14),  Fig.  G.7. 
Allow  for  the  interactive  input  of  the  quantities  nQxp,  a ,  b ,  and  a . 

Programming  Problem  7.1:  Distribution  of  Lifetimes  Determined 
from  a  Small  Number  of  Radioactive  Decays 

In  Example  Program  7.1,  an  estimate  t  of  the  mean  lifetime  r  and  its  asymmetric 
errors  A_  and  are  found  from  a  single  small  sample.  In  all  cases  the  program 
yields  A_  <  A+.  Write  a  program  that  simulates  a  large  number  nQxp  of  experiments, 
in  each  of  which  N  radioactive  decays  of  mean  lifetime  r  =  1  are  measured.  Compute 
for  each  experiment  the  estimate  t  and  construct  a  histogram  of  the  quantity  t  for  all 
experiments.  Present  this  histogram  /Vz-  (4-  </"</)  +  At)  and  also  the  cumulative 
frequency  distribution  ht  —  (1  /nexP)  ^i-  Allow  for  interactive  input  of  nexp 
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0  5  10  15  20  25  30 

- >  1 


Fig.G.5  :  Data  points  with  errors  of  different  size. 


and  N.  Demonstrate  that  the  distributions  are  asymmetric  for  small  N  and  that  they 
become  symmetric  for  large  N.  Show  that  for  small  N  the  value  t  =  1  is  not  the 
most  probable  value,  but  that  it  is  the  expectation  value  of  t.  Determine  for  a  fixed 
value  of  N,  e.g.,  N  =  4,  limits  Z\_  and  A+  in  such  a  way  that  t  <  1  —  A_  holds 
with  the  probability  0.683/2  and  that  with  the  same  probability  one  has  t  >  1  +  A+. 
Compare  the  results  found  with  a  series  of  simulated  experiments  from  the  program 
ElMaxLike,  Example  Program  7.1.  (Example  solution:  SIMaxLike) 

Programming  Problem  7.2:  Distribution  of  the  Sample  Correlation 
Coefficient 

Modify  the  class  E2MaxLike  so  that  instead  of  numerical  output,  a  histogram  of 
the  correlation  coefficient  r  is  presented  (Fig.  G.8).  Produce  histograms  for  p  =  0 
and  p  =  0.95,  each  for  np{  =  5,  50,  500.  Under  what  circumstances  is  the  distribu¬ 
tion  asymmetric  and  why?  Is  this  asymmetry  in  contradiction  to  the  Central  Limit 
theorem?  (Example  solution:  S2MaxLike) 

Programming  Problem  9.1:  Fit  of  a  First-Degree  Polynomial  to  Data 
that  Correspond  to  a  Second-Degree  Polynomial 

In  experimental  or  empirical  studies  one  is  often  confronted  with  a  large  number  of 
measurements  or  objects  of  the  same  kind  (animals,  elementary  particle  collisions, 
industrial  products  from  a  given  production  process,  . . .).  The  outcomes  of  the  mea¬ 
surements  performed  on  each  object  are  described  by  some  law.  Certain  assumptions 
are  made  about  that  law,  which  are  to  be  checked  by  experiment. 
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Consider  the  following  example.  A  series  of  measurements  may  contain  nexp 
experiments.  Each  experiment  yields  the  measurements  yz  =  x\  +X2U  +  x +  st 
for  10  values  tt  —  1, 2. ..,  10  of  the  controlled  variable  t.  The  are  taken  from  a 
normal  distribution  with  mean  zero  and  width  a .  In  the  analysis  of  the  experiments 
it  is  assumed,  however,  that  the  true  values  rjt  underlying  the  measurements  yz  can  be 
described  by  a  first-degree  polynomial  rji  =  x\  +X2 1.  As  result  of  the  fit  we  obtain 
a  minimum  function  M  from  which  we  can  compute  the  “x2-probability”  P  =  l  — 
F(M,  n  —  r).  Here  F(M ,  /)  is  the  distribution  function  of  a  x2  distribution  with  / 
degrees  of  freedom,  n  is  the  number  of  data  points,  and  r  the  number  of  parameters 
determined  in  the  fit.  If  P  <  a,  then  the  fit  of  a  first-degree  polynomial  to  the  data  is 
rejected  at  a  confidence  level  of  ft  =  1  —  a. 

Write  a  class  performing  the  following  steps: 

(i)  Interactive  input  of  nexp,  x\,  X2,  x$,  o ,  Ay. 

(ii)  Generation  of  nQX p  sets  of  data  (7Z  ,  yz  ,  Ay),  fit  of  a  first-degree  polynomial  to 
each  set  of  data  and  computation  of  P .  Entry  of  P  into  a  histogram. 

(iii)  Graphical  representation  of  the  histogram. 


N(x) 

A 


nexP=90000,  n=  3 


>  x 


Fig.G.6:  Histogram  of  90  000  random  numbers  x,  each  of  which  is  a  sum  of  three  uniformly 
distributed  random  numbers.  The  curve  corresponds  to  the  standard  normal  distribution. 
Significant  differences  between  curve  and  histogram  are  visible  only  because  of  the  very 
large  number  of  random  numbers  used. 
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Suggestions:  (a)  Choose  /iexp  =  1000,  x\  =  X2  =  1,  V3  =  0,  o  —  Ay  —  1.  As  expected 
you  will  obtain  a  flat  distribution  for  P.  (b)  Choose  (keeping  the  other  input  quantities 
as  above)  different  values  V3  /  0.  You  will  observe  a  shift  of  the  distribution  towards 
small  P  values,  cf.  Fig.  G.9.  Determine  approximately  the  smallest  positive  value  of 
V3  such  that  the  hypothesis  of  a  first-degree  polynomial  is  rejected  at  90%  confidence 
level  in  95%  of  all  experiments,  (c)  Choose  V3  =  0,  but  a  ^  Ay.  You  will  again 
observe  a  shift  in  the  distribution,  e.g.,  towards  larger  P  values  for  Ay  >  a .  (d)  From 
the  experience  gained  in  (a),  (b),  and  (c),  one  might  conclude  that  if  erroneously 
too  large  measurement  errors  are  assumed  (Ay  >  a)  then  a  flat  P  distribution  would 
result.  In  this  way  one  would  get  the  impression  that  a  first-degree  polynomial  could 
describe  the  data.  Begin  with  nexp  =  1000,  x\  —  X2  =  1,  X3  =  0.2,  a  —  1,  Ay  —  1  and 
increase  Ay  in  steps  of  0.1  up  to  Ay  —  2.  (Example  solution:  SILsq) 

Programming  Problem  9.2:  Fit  of  a  Power  Law  (Linear  Case) 

A  power  law 

rj  =  xtw 

is  linear  in  the  parameter  x  if  w  is  a  constant.  This  function  is  to  be  fitted  to  measure¬ 
ments  (, ti ,  yz)  given  by 

ti  =  t$  +  (i  —  Y)At  ,  /  =  , 


10  15  20 


>  x 


Fig.G.7  :  A  histogram  of  10000  random  numbers,  each  of  which  is  the  sum  of  a  uniformly 
distributed  random  number  and  a  normally  distributed  random  number.  The  curve  corre¬ 
sponds  to  the  convolution  of  a  uniform  and  a  normal  distribution. 
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N(r) 

A 


cq=  0.00,  a2=  0.00,  o\=  1.00,  o2=  1.00,  q  =  - 0 . 8 0 


>  r 


Fig.G.8:  Histogram  of  the  sample  correlation  coefficient  computed  for  1000  samples  of  size 
10  from  a  bivariate  Gaussian  distribution  with  the  correlation  coefficient  p  =  —0.8. 


yi  =  xt^+Si  . 

Here  the  st  follow  a  normal  distribution  centered  about  zero  with  width  a . 

Write  a  class  performing  the  following  steps: 

(i)  Interactive  input  of  n,  to,  At,  x,  w,  a. 

(ii)  Generation  of  measured  points. 

(iii)  Fit  of  the  power  law. 

(iv)  Graphical  display  of  the  data  and  the  fitted  function,  cf.  Fig.  G.10. 

(Example  solution:  S2Fsq) 

Programming  Problem  9.3:  Fit  of  a  Power  Law  (Nonlinear  Case) 

If  the  power  law  has  the  form 

T)  —  X\tXl  , 

i.e.,  if  the  power  itself  is  an  unknown  parameter,  the  problem  becomes  nonlinear. 
For  the  fit  of  a  nonlinear  function  we  have  to  start  from  a  first  approximation  of  the 
parameters.  We  limit  ourselves  to  the  case  tt  >  0  for  all  i  which  occurs  frequently 
in  practice.  Then  one  has  In rj  =  lnx\  -\-x2lnt.  If  instead  of  (tj ,  yz )  we  now  use 
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0  0.2  0.4  0.6  0.8  1 

- >  P(x2) 


Fig.G.9:  Histogram  of  the  y 2 -probability  for  fits  of  a  first-degree  polynomial  to  1000  data 
sets  generated  according  to  a  second-degree  polynomial. 


(lntz  ,  lnyz)  as  measured  variables,  we  obtain  a  linear  function  in  the  parameters  lnx\ 
and  X2-  However,  in  this  transformation  the  errors  are  distorted  so  that  they  are  no 
longer  Gaussian.  We  simply  choose  all  errors  to  be  of  equal  size  and  use  the  result  of 
the  linear  fit  as  the  first  approximation  of  a  nonlinear  fit  to  the  (7Z,  yz  ).  We  still  have 
to  keep  in  mind  (in  any  case  for  x\  >  0)  that  one  always  has  rj  >  0  for  t  >  0.  Because 
of  measurement  errors,  however,  measured  values  yz  <  0  can  occur.  Such  points  of 
course  must  not  be  used  for  the  computation  of  the  first  approximation. 

Write  a  class  with  the  following  steps: 

(i)  Interactive  input  of  n,  to,  At,  x\,  X2,  or. 

(ii)  Generation  of  the  n  measured  points 

ti  —  to  +  (i  —  l)At  ,  i  —  ,n  , 
yt  =  xit-2+ei  , 

where  £z  comes  from  a  normal  distribution  centered  about  zero  with  width  a . 

(iii)  Computation  of  first  approximations  x\,  x 2  by  fitting  a  linear  function  to 
(In  ti ,  In  yt )  with  LsqLin. 
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x—  1.01,  4x=  0.03,  M=  23.83,  P=0.2028 


Fig.G.10:  Result  of  the  fit  of  a  parabola  y  =  xt2  to  20  measured  points. 


(iv)  Fit  of  a  power  law  to  (tz,  yt)  with  LsqNon. 

(v)  Graphical  display  of  the  results,  cf.  Fig.  G.ll. 
(Example  solution:  S3Lsq) 


Programming  Problem  9.4:  Fit  of  a  Breit-Wigner  Function  to  Data  Points 
with  Errors 


For  the  N  =  21  values  —  —3,  —2.7,  . . . ,  3  of  the  controlled  variable  the  measured 
values 

yi=f(ti)+8i  (G.3.1) 

are  to  be  simulated.  Here, 


2  x\ 

71X2  4 (t  ~  X\)2  +X% 


(G.3.2) 


is  the  Breit-Wigner  function  (3.3.32)  with  a  —  x\  and  T  —  X2.  The  measurement 
errors  £j  are  to  be  taken  from  a  normal  distribution  around  zero  with  width  a .  Choose 
a  =  0  and  r  =  1.  The  points  (^,  yz)  scatter  within  the  measurement  errors  around  a 
bell-shaped  curve  with  a  maximum  at  t  =  a.  A  bell-shaped  curve  with  a  maximum  at 
the  same  position,  however,  could  be  given  by  the  Gaussian  function, 


1  f  (t  —  x  i)2 

exp 


fit) 


X2\fffn 


2x\ 


(G.3.3) 
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x-,=  1.12,  x2=  1.76,  Ax>,=  0.07,  Ax2=  0.12,  q=  -0.90 


Fig.G.ll:  Fit  of  a  function  y  =  x\tXl  to  20  data  points.  The  data  points  are  identical  to  those 
in  Fig.  G.  10. 


Write  a  class  with  the  following  properties: 

(i)  Interactive  input  of  a  and  possibility  to  choose  whether  a  Breit-Wigner  func¬ 
tion  or  a  Gaussianis  to  be  fitted. 

(ii)  Generation  of  the  data,  i.e.,  of  the  triplets  of  numbers  (7Z ,  yi,Ayi  =£,). 

(iii)  Fit  of  the  Breit-Wigner  function  (G.3. 2)  or  of  the  Gaussian  (G.3. 3)  to  the  data 
and  computation  of  the  minimum  function  M. 

(iv)  Graphical  representation  of  the  measured  points  with  measurement  errors  and 
of  the  fitted  function,  cf.  Fig.  G.12. 

Run  the  program  using  different  values  of  a  and  find  out  for  which  range  of  a  the 
data  allow  a  clear  discrimination  between  the  Breit-Wigner  and  Gaussian  functions. 
(Example  solution:  S4Lsq) 

Programming  Problem  9.5:  Asymmetric  Errors  and  Confidence  Region 
for  the  Fit  of  a  Breit-Wigner  Function 

Supplement  the  solution  of  Programming  Problem  9.4  such  that  it  yields  a  graphical 
representation  for  the  parameters,  their  errors,  covariance  ellipse,  asymmetric  errors, 
and  confidence  region  similar  to  Fig.  9.11.  Discuss  the  differences  obtained  when 
fitting  a  Breit-Wigner  or  a  Gaussian,  respectively.  For  each  case  try  a  —  0.1  and 
a  —  1.  (Example  solution:  S5Lsq) 
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x-,=  -0.05,  x2=  0.62,  M=  55.58,  P=0. 000019  (fit  to  Gaussian) 


-3-2-10123 

- >  t 


Fig.G.12:  Fit  of  a  Gaussian  to  data  points  that  correspond  to  a  Breit-Wigner  function.  The 
goodness-of-fit  is  poor. 


Programming  Problem  9.6:  Fit  of  a  Breit-Wigner  Function  to  a  Histogram 

In  the  Programming  Problem  9.4  we  started  from  measurements  yt  =  f  ft )  +  £z- .  Here 
fit)  was  a  Breit-Wigner  function  (3.3.32),  and  the  measurement  errors  £z  corre¬ 
sponded  to  a  Gaussian  distribution  centered  about  zero  with  width  cr. 

We  now  generate  a  sample  of  size  nQW  from  a  Breit-Wigner  distribution  with 
mean  a  =  0  and  full  width  at  half  maximum  r  =  1.  We  represent  the  sample  by  a 
histogram  that  is  again  characterized  by  triplets  of  numbers  ft,  yz,  Ayt ) .  Now  tt  is 
the  center  of  the  i th  bin  tt  —  At / 2  <  t  <  u  +  At  1 2,  and  yL  is  the  number  of  sample 
elements  falling  into  this  bin.  For  not  too  small  yt  the  corresponding  statistical  error  is 
Ayi  —  ^/yi-  For  small  values  yz  this  simple  statement  is  problematic.  It  is  completely 
wrong  for  yt  —  0.  In  the  fit  of  a  function  to  a  histogram,  care  is  therefore  to  be  taken 
that  empty  bins  (possibly  also  bins  with  few  entries)  are  not  to  be  considered  as  data 
points.  The  function  to  be  fitted  is 

f(t)=x3-- - X\  .  (G.3.4) 

71X2  4 it  —X\ Y  +X2 

The  function  is  similar  to  (G.3.2)  but  there  is  an  additional  parameter  V3. 

Write  a  class  with  the  following  steps: 

(i)  Interactive  input  of  nQY  (sample  size)  and  nt  (number  of  histogram  bins).  The 
lower  limit  of  the  histogram  is  to  be  fixed  at  t  =  —3,  the  upper  limit  at  t  =  3. 
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0.02,x2=  0.87,x3=  28.1, Ax1=  0.07,Ax2=  0.16,Ax3=  3.3 


>  i 


Fig.G.13:  Fit  of  a  Breit-Wigner  function  to  a  histogram. 


(ii)  Generation  of  the  sample,  cf.  Programming  Problem  4.1. 

(iii)  Construction  of  the  histogram. 

(iv)  Construction  of  the  triplets  (tj ,  yt ,  Ayt)  to  be  used  for  the  fit. 

(v)  Fit  of  the  function  (G. 3.4). 

(vi)  Output  of  the  results  in  numerical  and  graphical  form,  cf.  Fig.  G.13. 


Suggestions:  Perform  consecutive  fits  for  the  same  sample  but  different  numbers  of 
bins.  Try  to  find  an  optimum  for  the  number  of  bins.  (Example  solution:  S6Lsq) 


Programming  Problem  9.7 :  Fit  of  a  Circle  to  Points  with  Measurement 
Errors  in  Abscissa  and  Ordinate 

A  total  of  m  data  points  (sj ,  t[ )  are  given  in  the  (s,  t)  plane.  The  measurement  errors 
are  defined  by  2  x  2  covariance  matrices  of  the  form 


Asf 


(G.3. 5) 


as  in  Example  9.11.  Here  q  =  AsiAppi  and  pt  is  the  correlation  coefficient  between 
the  measurement  errors  Ast,  Ap.  As  in  Example  9.11,  construct  the  vector  y  of 
measurements  from  the  sL  and  tt  and  construct  the  covariance  matrix  Cv .  Set  up 
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the  equations  fk(x,r\)  =  0  assuming  that  the  true  positions  underlying  the  measured 
points  lie  on  a  circle  with  center  (x\,  x2)  and  radius  V3.  Write  a  program  with  the 
following  steps: 

(i)  Input  of  m,  At ,  p. 

(ii)  Generation  of  m  measured  points  (.vz ,  t[ )  using  bivariate  normal  distributions, 
the  means  of  which  are  positioned  at  regular  intervals  on  the  unit  circle  (x\  = 
x2  =  0,  V3  =  1)  and  the  covariance  matrix  of  which  is  given  by  (G.3.5)  with 
Asi  —  As ,  Ati  —  At ,  c  —  AsAtp. 

(iii)  Determination  of  a  first  approximation  for  x\,  x2,  *3  by  computation  of  the 
parameters  of  a  circle  through  the  first  three  measured  points. 

(iv)  Fit  to  all  measured  points  using  LsqGen  and  a  user  function  specially  written 
for  this  problem. 

(v)  Graphical  representation  of  the  measured  points  and  the  fitted  circle  as  in 
Fig.  G.14. 

(Example  solution:  S6Lsq) 


x1=  0.04,  x 2—  0.06,  x3—  1.05,  M=  21.32,  P=0.2122 


-1.5  -1  -0.5  0  0.5  1  1.5 


>  i 


Fig.  G.14:  Measured  points  with  covariance  ellipses,  circle  of  the  first  approximation  which 
is  given  by  3  measured  points  ( broken  line),  and  circle  fitted  to  all  measured  points. 
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Programming  Problem  10.1:  Monte  Carlo  Minimization  to  Choose 
a  Good  First  Approximation 

For  some  functions  in  Example  Program  10.1  the  choice  of  the  point  xq  defining 
the  first  approximation  was  decisive  for  success  or  failure  of  the  minimization.  If 
a  function  has  several  minima  and  if  its  value  is  smallest  at  one  of  them,  that  is  if 
an  “absolute  minimum”  exists,  the  following  procedure  will  work.  One  uses  the 
Monte  Carlo  method  to  determine  a  first  approximation  xq  of  the  absolute  minimum 
in  a  larger  region  of  the  parameter  space  by  generating  points  x  =  (x\,  X2, . . . ,  xn) 
according  to  a  uniform  distribution  in  that  region  and  by  choosing  that  point  at  which 
the  function  has  the  smallest  value. 

On  the  basis  of  ElMin  write  class  that  determines  the  absolute  minimum  of  the 
function  /7(x)  described  in  Sect.  10.1.  A  first  approximation  xq  within  the  region 

— 10  <  xoi  <10  ,  i  =  1,2,3  , 

is  to  be  determined  by  the  Monte  Carlo  method.  Perform  the  search  for  the  first 
approximation  with  N  points  generated  at  random  and  allow  for  an  interactive  input 
of  N.  (Example  solution:  SI  Min) 

Programming  Problem  10.2:  Determination  of  the  Parameters 

of  a  Breit-Wigner  Distribution  from  the  Elements  of  a  Sample 

By  modifying  a  copy  of  E2Min  produce  a  class  that  simulates  a  sample  from  a 
Breit-Wigner  distribution  with  mean  a  and  full  width  at  half  maximum  F,  and  that 
subsequently  determines  the  numerical  values  of  these  parameters  by  minimization  of 
the  negative  log-likelihood  function  of  the  sample.  Allow  for  interactive  input  of  the 
sample  size  and  the  parameters  a  and  F  used  in  the  simulation.  (Example  solution: 
S2Min) 

Programming  Problem  11.1:  Two-Way  Analysis  of  Variance  with  Crossed 
Classification 

The  model  (1 1.2. 16)  for  the  data  in  an  analysis  of  variance  with  crossed  classification  is 


xijk  ~  F  +  ai  ~\~  b j  +  (db)ij  +  Sijk 
The  analysis  of  variance  tests  the  null  hypothesis 


Data  corresponding  to  this  null  hypothesis  are  generated  by  the  program  E2Anova, 
Sect.  11.2,  and  are  subsequently  analyzed. 

Write  a  program  similar  to  E2Anova  which  generates  data  v  z/^  according  to  the 
above  formula  with 


at 

bj 
(■ ab)ij 


signum  (at )  signum  ( bj ) 


ab 
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-1  -0.5  0  0.5  1 

- >  t 


Fig.G.15:  Data  points  with  errors  and  regression  polynomials  of  different  degrees. 


These  relations  fulfill  the  requirements  (11.2.7).  The  as  in  E2Anova  are  to  be 
drawn  from  a  normal  distribution  with  mean  zero  and  standard  deviation  a .  Allow 
for  interactive  input  of  the  quantities  a ,  b ,  ab ,  a,  /x.  Perform  an  analysis  of  variance 
on  the  simulated  data.  Study  different  cases,  e.g.,  a  —  0,  b  ^  0,  ab  —  0;  a  ^  0,  b  —  0, 
ab  —  0;  a  =  0,  b  —  0,  ab  ^  0;  etc.  (Example  solution:  SlAnova) 

Programming  Problem  11.2:  Two-Way  Analysis  of  Variance 
with  Nested  Classification 

Modify  Programming  Problem  11.1  for  the  treatment  of  a  nested  classification  with 
data  of  the  form  (11.2.22),  e.g., 

xi jk  ~  /^  +  at  +  hi j  +  £i jk  , 

and  use  the  relations 


(Example  solution:  S2Anova) 

Programming  Problem  12.1:  Simulation  of  Data  and  Plotting  Regression 
Polynomials  of  Different  Degrees 

Write  a  class  which  generates  n  data  points  (A ,  yt).  The  tt  are  to  be  spread  equidis- 
tantly  over  the  interval  —  1  <  t  <  1 ;  the  yt  are  to  correspond  to  a  polynomial  with  r 
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terms  and  to  have  errors  with  standard  deviation  a.  A  regression  analysis  is  to  be 
performed  and  a  plot  as  in  Fig.  G.15  of  the  data  and  the  regression  polynomials  to  be 
produced.  Your  program  may  be  largely  based  on  the  classes  E4Reg  and  E2Reg. 
(Example  solution:  SIReg) 

Programming  Problem  12.2:  Simulation  of  Data  and  Plotting  the 
Regression  Line  with  Confidence  Limits 

Extend  the  solution  of  Programming  Problem  12.1  so  that  a  regression  polynomial 
of  the  desired  degree  together  with  confidence  limits,  corresponding  to  Sect.  12.2  is 
shown.  (Example  solution:  S2Reg) 


Programming  Problem  13.1:  Extrapolation  in  a  Time  Series  Analysis 

In  Sect.  13.3  we  have  stressed  that  one  must  be  very  cautious  with  the  interpretation 
of  the  results  of  a  time  series  analysis  at  the  edge  of  a  time  series,  and  of  the  extrapo¬ 
lation  in  regions  outside  the  time  series.  In  particular,  we  found  that  the  extrapolation 
yields  meaningless  results  if  the  data  are  not  at  least  approximately  described  by  a 
polynomial.  The  degree  of  this  polynomial  must  be  smaller  than  or  equal  to  the  de¬ 
gree  of  the  polynomial  used  for  the  time  series  analysis. 

Study  these  statements  by  simulating  a  number  of  time  series  and  analyzing 
them.  Write  a  program  -  starting  from  E2TimSer-  that  for  n  —  200,  i  —  1, 2, . . . ,  n, 
generates  data  of  the  form 


i  -  100 
50 


Here  the  £;  are  to  be  generated  according  to  a  normal  distribution  with  mean  zero 
and  standard  deviation  a.  Allow  for  the  interactive  input  of  m,  cr,  k,  l,  and  P. 
After  generating  the  data  perform  a  time  series  analysis  and  produce  a  plot  as  in 
E2TimSer.  Study  different  combinations  of  m,  k,  and  l ,  and  for  each  combination 
use  small  values  of  a  (e.g.,  o  —  0.001)  and  large  values  of  o  (e.g.,  o  —  0. 1).  (Example 
solution:  SITimSer) 


Programming  Problem  13.2:  Discontinuities  in  Time  Series 

In  the  development  of  time  series  analysis  it  was  assumed  that  the  measurements, 
apart  from  their  statistical  fluctuations,  are  continuous  functions  of  time.  We  there¬ 
fore  expect  unreliable  results  in  regions  where  the  measurements  are  discontinuous. 
Write  a  program  that  generates  the  following  three  types  of  time  series,  analyzes 
them,  and  displays  the  results  graphically.  One  of  them  is  continuous;  the  other  two 
contain  discontinuities. 

Sine  function: 

yt  =  sin(7r^/180)  +  £*■  ,  t\—i  ,  i  = 
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Step  function: 

ti  —  i  mod  200  ,  i  —  1 , 2, . . . ,  n 


ti  —  i  mod  100  ,  i  =  1 , 2, . . . ,  n 

The  £j  are  again  to  be  generated  according  to  normal  distribution  with  mean  zero 
and  standard  deviation  a .  Allow  for  the  choice  of  one  of  the  functions  and  for  the 
interactive  input  of  n ,  a,  k ,  £,  and  P .  Study  the  time  series  using  different  values 
for  the  parameters  and  discuss  the  results.  Figure  G.16  shows  an  example.  (Example 
solution:  S2TimSer) 


_  —  1 -  +  £/  ?  U  —  100 

1  1  +  £l  ,  ti  >  100 

Sawtooth  function: 
y,  =  (t,  —  50)/100  +  £,  , 


Fig.  G.16:  Time  series  corresponding  to  a  step  function  with  moving  averages  and  confidence 
limits. 
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Probability 

A,  B,  .. .  are  events ;  A  is  the  event  “not  A”. 

(A  +  B)  and  (A B)  combine  events  with  logical  “or”  or  logical  “and” . 

P (A)  is  the  probability  of  the  event  A. 

P(B\A)  —  P(AB)/ P(A)  is  the  probability  for  B  given  the  condition  A  ( con¬ 
ditional  probability). 

The  following  rules  hold: 

For  every  event  A 

P(A)  =  1-P(A)  , 

for  mutually  exclusive  events  A,  B,  Z 

P(A  +  B  +  •  •  •  +  Z)  =  P(A)  +  P(B)  +  ---  +  P(Z)  , 
for  independent  events  A,  B,  Z 

P(AB  ■  ■  ■  Z)  =  P(A)P(B)  ■  ■  ■  P(Z)  . 

Single  Random  Variable 

Distribution  function:  F  (x )  —  P(x  <  x) 

Probability  density  (for  F(x )  differentiable): 
fix)  —  F'(x)  —  df(x)/dx 

Moments  of  order  i: 

(a)  About  the  point  c:  ai  =  E{(x  —  c)£j 

(b)  About  the  origin  (central  moments):  X(  —  E{x'  } 

(c)  About  the  mean:  /z^  =  E{(x  —  'x)1} 
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Table H.l:  Expectation  values  for  discrete  and  continuous  distributions. 


x  discrete 

x  continuous; 

F(x)  differentiable 

Probability 

density 

— 

/(x)  =  F'(x)  =  ^, 
f-oofix)  dx  =  1 

Mean  of  x 

x  =  E  (x) 

x  =  F(x) 

(expectation  value) 

=  J2ixiP(x  =  xi) 

=  f^°ooxf  (x)dx 

Mean  of  the 

E{H(x)} 

F{H(x)} 

function  H(x) 

=  Jfi  H (xQ P (x  =  Xi) 

=  f^°00H(x)f(x)  dx 

Variance:  er2(x)  =  var(x)  =  p, 2  =  E{(x  —  x)2} 

Standard  deviation  or  error  of  X:  Ax  =  a(x)  =  +y/ cr-(X) 

Skewness:  y  =  /Z3/cr3 

Reduced  variable:  U  =  (x  — x)/er(x)  ;  F(u)  =  0  ,  cr2(u)  =  1 

Mode  (most  probable  value)  xm  defined  by:  P(X  —  xm)  —  max 
Median  *0.5  defined  by:  F(xo.5)  =  P(x  <  xo.5)  =  0.5 
Quantile  xq  defined  by:  F  (xq)  =  P  (x  <  xq)  =  q  ;  0  <  g  <  1 

Several  Random  Variables 

Distribution  function: 

F(x)  =  F(x  1 

^  2  5***5  pi  )  =  P(*l  <  X\,  X2  <  X2,  xn  <  X„) 

Joint  probability  density  (only  for  F(x)  differentiable  with  respect  to  all  vari¬ 
ables): 


f(x)  —  f(x i,X2,...,x„)  =  dnF(x i,x2,  ...,xn)/dx\dx2...dxn 
Marginal  probability  density  of  the  variable  x(- : 

/OO  POO  POO 

/  •••  /  /(xi,X2,...,x„)dxidx2...dx/_idx/+i...dx:n 

-00  J  — 00  */ — OO 
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Expectation  value  of  a  function  H(x): 


E{H(x)}=  /  H (x) f  (x) dx 


Expectation  value  of  the  variables  Xj : 

//»00 

x/  /*  (x)  dx  =  /  x/  gi  (x/ )  dx/ 

J — oo 

The  variables  Xi,  x2, . . . ,  xn  are  independent,  if 

f  (At  1  -^2»  •  •  •  >  -^w)  =  8 I  (-^ l)^2 (-^2 )  •  •  •  §n  (An) 


Moments  of  order  i  1 , £2, . . .  ,£n  • 


(a)  About  c  =  (c\,C2, ,  cn ): 

Of€l€2..^„  =  ^{(Xi  -  ci)£l  (x2  -  c2)h  •  •  •  (xM  -  cM)4} 

(b)  About  the  origin:  A .£li2..xn  =  E {x\lx1^  •  •  •  xf/' } 

(c)  Aboutx:  fihe2...en  =  £{(xi -xi)£l(x2-x2)£2---(xw 
Variance  of  x(:  er2(x()  =  £ { (x,-  —  v/)2}  =  ca 

Covariance  between  x,  and  x; :  cov(x;,  x;)  =  £{(X;  —  x))(Xj 
For  X/,  Xj  independent:  cov(x,- ,  x7 )  =  0 

Covariance  matrix:  C  =  £{(x  — x)(x  — x)T} 

Correlation  coefficient: 


p(Xi,X2)  =  cov(Xi,  X2)/a  (Xi)a(x2)  ;  -1<P<1 


Rules  of  computation: 

a2  (cXi)  —  c1  o2  (Xi)  , 

o2{aXi  +bXj)  =  a2cr2(Xj) +  b2cr2(Xj) +  2abcov(Xj,Xj)  ; 
a,b,c  are  constants 


Transformation  of  Variables 

Original  variables:  x  =  (Xi ,  x2, . . . ,  xn) 

Probability  density:  fix) 

Transformed  variables:  y  =  (yi, y2, . . . , y„) 
Mapping:  yi  =  yi(x),  y2  =  y2(x),  . . . ,  y „  =  y„(x) 
Probability  density:  g(y)  =  |7|/(x) 
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with  the  Jacobian  determinant 


n 


y  i  >  •  •  • » 


dx\ 

3X2 

dxn 

dyi 

• 

3yi 

dyi 

• 

• 

3xi 

3X2 

dxn 

9  yn 

3  yn 

3  yn 

Error  Propagation 


The  original  variables  x  have  the  covariance  matrix  Cx .  The  covariance  ma¬ 
trix  of  the  transformed  variables  y  is 


Cy  =  TCxTJ  with  T  = 


/  dyi 

dyi 

3yi 

\ 

3xi 

3X2 

dxn 

dy2 

dy2 

dy2 

3xi 

• 

3X2 

dxn 

• 

• 

dym 

dym 

dym 

V  9xi 

3X2 

dxn 

/ 

The  formula  is  only  exact  for  a  linear  relation  between  y  and  x,  but  is  a  good 
approximation  for  small  deviations  from  linearity  in  a  region  around  x  of  the 
magnitude  of  the  standard  deviation.  Only  for  vanishing  covariances  in  Cx 
does  one  have 


o{yt)  =  Ayt 


The  Law  of  Large  Numbers 


A  total  of  n  observations  are  carried  out,  which  are  characterized  by  the  ran¬ 
dom  variable  x,  (=  1,  if  on  the  / th  observation  the  event  A  occurs,  other¬ 
wise  =  0).  Th e  frequency  of  A  is 


For  n  — ►  oo  this  frequency  is  equal  to  the  probability  p  for  the  occurrence 
of  A, 

<r2(h)  =  -p(l  -p)  . 

n 


E(h)  =  h  =  p 
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Table  H.2:  Distributions  of  discrete  variables. 


Distribution 

Probability  for 
observing  x  =  k 
(Xi  —k\, 

X/  =  ki) 

Mean 

Variance 
(elements  of 

covariance  matrix) 

Binomial 

wk 

=  (t) Pk (1  ~  P)n~k 

II 

s 

"ss 

cr2(x)  =  np(  1  —  p) 

Multinomial 

i  PJ  1 

Wn 

—  n-  Ul  nkj 

n /= i  kj'  7=1  PJ 

II 

s 

"S3 

— 

Cij  =npi(8jj  -  pj) 

Hypergeometric 

l  =  n-k.  Wk  =  ■.  O 

l  —  n  —  k 

II 

3 

/t2/v\  _  nKL(N-n) 

°  W  N2(Af-l) 

Poisson 

m  = 

x~  —  A 

II 

<N 

b 

Central  Limit  Theorem 

If  Xj  are  independent  variables  with  mean  a  and  variance  a2,  then  ( 1  / n ) 
TH=\  X/  for  n  oo  follows  a  normal  distribution  with  mean  a  and  variance 
o2 /n. 

Convolutions  of  Distributions 

The  probability  density  of  the  sum  u  =  x  +  y  of  two  independent  random 
variables  x  and  y  is 

/OO  /» OO 

fx(x)fy(u-x)dx  =  fy(y)fx(u-y)dy  . 

-oo  J — OO 

-  The  convolution  of  two  Poisson  distributions  with  the  parameters  A.i 
and  1-2  is  a  Poisson  distribution  with  the  parameter  Ai  +  X2. 

-  The  convolution  of  two  Gaussian  distributions  with  means  a\,  ai  and 
variances  a2,  a}  is  a  Gaussian  distribution  with  mean  a  =  a\+  ci2  and 
variance  a2  —  o2  +  o}. 
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Table  H.3:  Distributions  of  continuous  variables. 


Variance 

Distribution  Probability  density  Mean  (covariance 

matrix) 


Uniform 


0;  x  >  a,  x  >  b 
t-2— ;  a  <  x  <  b 

b—a  ’  — 


\(b  +  a)  (i b-af/12 


Gaussian 


a 


b 


2 


Stand.  1  DYnr  iy2\ 

Gaussian  V2F  VK  2X  > 


0 


1 


Gaussian 
of  several 
variables 


/cexp{-,x~a)Tfx~a)} 


a  C  =  B~l 


X 


2 


,  1  if(x2)^  ^xpC-^x2)  /  2/ 

r{\f)  22/ 


Fisher’s  F 


(AVfl 

\fi)  r(\  h)r{\f2) 

x  H/l_1  (l  +  2fF) 


3U1+/2) 


h 

h-2 ’ 


/2>2 


2/72(/i+/2~2) 
/l(/2-2)2f/2-4)  ’ 

/2>4 


Student’s  f 


F(f(/+1»  A  £2\“z(/+1) 

\  +  f) 


0 


/ 

/— 2  ’ 

/  >  2 


-  The  convolution  of  two  x  ^distributions  with  /1  and  /2  degrees  of  free¬ 
dom  is  a  x  2-distribution  with  f  =  fi  +  f 2  degrees  of  freedom. 


Samples 

Population:  An  infinite  (in  some  cases  finite)  set  of  elements  described  by  a 
discrete  or  continuously  distributed  random  variable  x. 

( Random )  sample  of  size  N:  A  selection  of  N  elements  (x^,  x*-2-*,  . . .,  x^) 
from  the  population.  (For  the  requirements  for  a  sample  to  be  random  see 
Sect.  6.1.) 
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Table H.4:  Samples  from  different  populations. 


Sample  of  size  n 
from  a  continuously 
distributed  population 

Sample  of  size  n  from  a 
discrete  population  of  size  N. 
Variable  of  the  population  is  y, 
variable  of  the  sample  is  x 

Population 

mean 

E(X)  —xi 

N 

y  =  jjT,yj 

7=1 

Population 

variance 

a2(x) 

N 

<t2W  =  7J=T  E(W-v)2 

7  —  1 

Sample 

mean 

n 

x  =  -  V  x, 

i— 1 

n 

x  =  -  V  x; 

i  —  \ 

Sample 

variance 

mean 

a2(x)  =  ia2(x) 

«r2(x)  =  ^(l-i) 

Variance  of 
the  sample 

s2  =  „iiE(x*'  x)2 

i= 1 

s2  =  ni iE(Xf  x)2 
/=1 

Distribution  function  of  the  sample:  W„  (x )  =nx/N, 

where  nx  is  the  number  of  elements  in  the  sample  for  which  x  <  x. 

Statistic:  An  arbitrary  function  of  the  elements  of  a  sample 

S  =  S(x(1),x(2),  ...,x(iV))  . 

Estimator:  A  statistic  used  to  estimate  a  parameter  A  of  the  population.  An 
estimator  is  unbiased ,  if  E  (S)  =  A  and  consistent,  if 

lim  a  (S)  =  0  . 

N^oo 

Maximum-Likelihood  Method 

Consider  a  population  described  by  the  probability  density  /(x,  \),  where 
\  =  (Ai,  A2, . . . ,  A p)  is  a  set  of  parameters.  If  a  sample  x(1),  yf2\  . . xw 

is  obtained,  then  the  likelihood  function  is  L  —  ni  ;1  /(x,  /\  A)  and  the  log- 
likelihood  function  is  l  =  InL.  In  order  to  determine  the  unknown  parameters 
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X  from  the  sample  the  maximum-likelihood  method  prescribes  the  value  of  X 
for  which  L  (or  t  )  is  a  maximum.  That  is,  one  must  solve  the  likelihood  equa¬ 
tion  dt/dki  =  0;  i  =  1.2,...,/;  or  (for  only  one  parameter)  df/dA.  =  l'  =  0. 

Information  of  a  sample:  I  (X)  —  E(l /2)  —  —  E(£") 

Information  inequality:  cr2(S)  >  {1  -B'(X)}2/I(X) 

Here  S  is  an  estimator  for  X,  and  B(X)  =  E (S)  —  X  is  its  bias. 

An  estimator  has  minimum  variance  if  l '  =  A(A)(S  —  E{ S)),  where  A(k) 
does  not  depend  on  the  sample. 

The  maximum-likelihood  estimator  X,  i.e.,  the  solution  of  the  likelihood  equa¬ 
tion,  is  unique,  asymptotically  unbiased  (i.e.,  for  N  — >  oo),  and  has  minimum 
variance. 


The  asymptotic  form  of  the  likelihood  function  for  one  parameter  X  is 

(A -I)2 


L  =  const  •  exp  I  — 


2  b2 


b2  =  a2(X )  = 


1 


1 


E(i'2(X))  E(i"(X)) 


and  for  several  parameters 


L  =  const  • 


- Ux-X)tB(X-X ) 


with  the  covariance  matrix  C  —  B  1  and 

d2i 


Bij  = 


-E 


dXj  3  Xj 


Testing  Hypotheses 

Null  hypothesis  Hq(X  =  To):  Assumption  of  values  for  the  parameters  X  that 
determine  the  probability  distribution  /(x,  X)  of  a  population. 

Alternative  hypotheses  H\(X  =  ki),  EliiX  =  X2),  ■  ■ Other  possibilities  for 
X,  against  which  the  null  hypothesis  is  to  be  tested  by  consideration  of  a  sam¬ 
ple  X  =  (x(1),  x^2\  . . . ,  x^)  from  the  population. 

A  hypothesis  is  simple  if  the  parameters  are  completely  determined,  e.g., 
Hq{X\  =  1,  A 2  =  5),  otherwise  it  is  composite,  e.g.,  H\{X\  =  2,  A 2  <  7). 

Test  of  a  hypothesis  Hq  with  a  significance  level  a  or  confidence  level  l  —  a: 
Ho  is  rejected  if  X  e  Sc,  where  Sc  is  the  critical  region  in  the  sample  space 
and 


P(XeSc\H0)=a 
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Error  of  the  first  kind:  Rejection  of  Ho,  although  Ho  is  true.  The  probability 
of  this  error  is  a. 

Error  of  the  second  kind:  Ho  is  not  rejected  although  H \  is  true.  The  proba¬ 
bility  of  this  error  is  P(X  f  SC\H\ )  =  fi. 

Power  function: 

M(SC,  X)  =  P(X  e  SC\H )  =  P(X  6  SC|X)  . 

Operating  characteristic  function: 

L(SC,  X)  =  1-M(SC,  X)  . 

Most  powerful  test  of  Ho  with  respect  to  H\  has  M(SC,  Xi)  =  1  —  f  =  max. 
A  uniformly  most  powerful  test  is  a  most  powerful  test  with  respect  to  all  pos¬ 
sible  Hi. 

An  unbiased  test  has  M(SC,  \\ )  >  a  for  all  possible  H\ . 

Neym an— Pearson  lemma:  A  test  of  Ho  with  respect  to  H\  (both  simple 
hypotheses)  with  the  critical  region  Sc  is  a  most  powerful  test  if  f  (X\ Ho ) / 
f(X\Hfi  <  c  for  X  e  Sc  and  >  c  for  X  f  Sc,  where  c  is  a  constant  only  de¬ 
pending  on  a. 

Test  statistic  T (A):  Scalar  function  of  the  sample  X.  By  means  of  a  mapping 
X  —*■  T (A),  Sc( A)  — U  the  question  as  to  whether  A  e  Sc  can  be  reformu¬ 
lated  as  T  6  U . 

Likelihood- ratio  test:  If  a>  denotes  the  region  in  the  parameter  space  corre¬ 
sponding  to  the  null  hypothesis  and  L2  denotes  the  entire  possible  parameter 
region,  then  the  test  statistic 

T  =  f(x ;  \(Q))/fix-  \(0))) 

is  used.  Here  X  and  X  are  the  maximum-likelihood  estimators  in  the 
regions  L2  and  co.  Ho  is  rejected  if  T  >  T\-a  with  P(T  >  Ti-a\Ho)  — 
g(T)dT  —  or,  g(T)  is  the  conditional  probability  density  of  T  for  a 
given  H0. 

Wilks  theorem  (holds  for  weak  requirements  concerning  the  probability  den¬ 
sity  of  the  population):  If  Ho  specifies  p  —  r  out  of  the  p  parameters,  then 
—2 In  T  (where  T  is  the  likelihood-ratio  test  statistic)  follows  a  x  2-distribution 
with  f  —  p  —  r  degrees  of  freedom  in  the  limit  n  — >  oo. 

X2-Test  for  Goodness-of-Fit 

Hypothesis:  The  N  measured  values  y,-  with  normally  distributed  errors  cr,  are 
described  by  given  quantities  f . 
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Table  H.5:  Frequently  used  statistical  tests  for  a  sample  from  a  normal  distribution  with  mean 
A  and  variance  a2 .  (Case  1:  a  known;  case  2:  a  unknown  (Student’s  test);  case  3:  x2_test 


of  the  variance;  case  4:  Student’s  difference  test  of  two  samples  of  sizes  N\  and  N2  on  the 
significance  of  s2A,  cf.  (8.3.19);  case  5:  F-test  of  two  samples). 


Case 

Test  statistic 

Null  hy¬ 
pothesis 

Critical  region 
for  test  statistic 

Number  of 
degrees  of 

freedom 

1 

rj-i  X  —  Aq 

—  0-/VV  ’ 

N 

x  =  y£  x°' 

/= 

^  >*  >> 

IV  1 A  II 
^  y  ^ 

o  o  o 

|T|  >  £2(1 -a/2) 

T  >  £2(1  -a) 

T  <  £2  (a) 

— 

2 

r-p  X - Aq 

—  S/VV  ’ 
c2  1 
a  —  N-\ 

N 

X  ^  (x(j)  -  x)2 

7  =  1 

o  o  o 
<<  <<  << 

II  V 1  A 1 

-<  *<  *< 

\T\>  ti-a/2 

T  >  h-a 

T  <:-  t\—a  =  ta 

N  —  1 

3 

T  =  (W-  1)4 

°o 

a2  =  a2 
a2  <  a2 
a2  >  a2 

Xf-a/l  <  T  <  4/2 
T  >  Xla 

T  <  xl 

TV-  1 

4 

J  _  X!-X2 

Sz\ 

M  =  A  2 

-Vi  +  -/V2  —  2 

T  =  s2/s2  , 

c2  _  1 

*  —  Ni  —  l 


Ni 


.2 

1  - 


=  o 


or  < 


a 


2 

2 


5 


Fl-a/2  <  T  <  Fa/ 2 
T  >  Fi-a 


h  =  Ni  - 1 

fi  =  N2-l 
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Test  function:  T  =  ~  fi)2  Id¬ 

entical  region:  T  >  X\-a- 

Number  of  degrees  of  freedom:  N  (or  N  —  p,  if  p  parameters  are  determined 
from  the  measurements). 


The  Method  of  Least  Squares 

One  considers  a  set  of  m  equations  /^(x,  q)  =  0;  k  =  1,  m  relating  the 
r-vector  of  the  unknowns  x  =  (xi,  X2,  ■  ■  ■ ,  xr )  to  the  //-vector  of  measurable 
quantities  q  =  (i]\,  772,  . . . ,  /?„ ).  Instead  of  q,  the  quantities  y  are  measured, 
which  deviate  from  them  by  the  measurement  errors  e,  i.e.,  y  =  t)  +  e.  The 
quantities  e  are  assumed  to  be  normally  distributed  about  zero.  This  is  ex¬ 
pressed  by  the  covariance  matrix  Cy  —  G  ”1 .  In  order  to  obtain  the  solution  x, 
rf,  one  expands  the  fk  for  the  first  approximations  xo,  q0  =  y.  Only  the  linear 
terms  of  the  expansion  are  kept  and  the  second  approximation  xi  =  xo  +  ij, 
1  =  q  +  8  is  computed.  The  procedure  is  repeated  iteratively  until  certain 
convergence  criteria  are  met,  for  example,  until  a  scalar  function  M  no  longer 
decreases.  If  the  fk  are  linear  in  x  and  rp  then  only  one  step  is  necessary. 
The  method  can  be  interpreted  as  a  procedure  to  minimize  M.  The  function 
M  corresponding  to  the  solution  depends  on  the  measurement  errors.  It  is 
a  random  variable  and  follows  a  x  2-distribution  with  f  —  m  —  r  degrees  of 
freedom.  It  can  therefore  be  used  for  a  x2-test  of  the  goodness-of-fit  or  of 
other  assumptions,  in  particular  the  assumption  Cy  =  Gfl.  If  the  errors  e  are 
not  normally  distribution,  but  still  distributed  symmetrically  about  zero,  then 
the  least-squares  solution  x  still  has  the  smallest  possible  variance,  and  one 
has  E(M )  —  m  —  r  (Gauss-Markov  theorem ). 
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Table  H.6:  Least  squares  in  the  general  case  and  in  the  case  of  constrained  measurements. 

General  case  Constrained  measurements 


Equations 

First  approx¬ 
imations 


Equations 

expanded 

Covariance 
matrix  of  the 
measurements 


Corrections 


Next 

step 

Solution 
(after  5 

steps) 

Minimum 

function 


Covariance 

matrices 


fk(x,T\)  =  0,  k  = 

xo,  -n0  =  y 

f  =  Ai,  +  58  +  C+-- 

{A}ti  —  ( dfk/dxi)Xor]o 
{B}kl  =  (9/fc/3?7z)Xo,T)o 

C  =  f(x0,  T)o) 

Cy  =  g;1 

%  =  -(AtGbA)-1AtGb  c 

8  —  —G~1BtGb(A^  +  c) 
Gb  =  (BG-1Bt)~1 

r ^ 

Xi  =x0  +  ?,  r\x  =  vi0  +  8, 

r ^ 

new  values  for  A,  B,  c,  ij,  8 

x  =  xs-i  +J, 
r\  =  ri,_i+8, 
e  =  y-ri 

M  =  (Be)T  G  B  (Be) 

G-1  =  (ATGBA)~1 

g~  1  =  g;1 

-G~1BtGbBG-1 

+  Gy1  Bt  G  b  A(At  G  b  A)-1 

xAtGbBG~ 1 


fkb 1)  =  0 

%  =  y 

f=  58  +  C+-- 

{ B}u  =  (dfk/dVl)^ 

c  -  f(tlo) 

C,  =  G-' 

h  =  -G-lBJGB  c 
Gb  —  (BGy1  BJ)~l 

r s-/ 

^1  =  ^0  + 

new  values  for  5,  c,  8 
=  TE-i  +8, 

e  =  y-n 

M  =  (Be)TGB(Be) 


G~ 1  =  g;1 

—  G~l  BtGbBG~1 
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Table  H.7:  Least  squares  for  indirect  and  direct  measurements. 


Direct  measurements 
Indirect  measurements  of  different 

accuracy 


Equations 

First  approx¬ 
imations 


Equations 

expanded 


Covariance 
matrix  of  the 
measurements 


Corrections 

Next 

step 


Solution 
(after  s 

steps) 

Minimum 

function 


Covariance 

matrices 


fk  =  m~gk(x)=  o 


fk  =  i lk~x 


xo,  %  =  y 


f  =  —  g  +  c  + 

[A)kl  =  (dfk 


•  •  • 


dxi 


c  =  y  -  g(xo) 


c»  =  a-' 


?  =  -(ATG,A)-1ATGyc 

Xl  =  X0  +  ij, 

new  values  for  A,  c,  ij 

X  —  Xj-—!  +f, 

g  =  A$+c 


M  =  gTCvg 


— l 


G~L  =  (AlGyA) 


-l 


G-1  =  A(AJGyA)A 


Xo  =  o,  T)0  =  y 

f  =  Ai;  +g  +  C 

/  1  \ 

1 

w 


A  =  - 


c  =  y 
=  G;1 


/ 


a 


l 


V  0 


0  \ 


) 


$=x  = 


T.)’k/ 


Or 


2  i 


Sk-yk-x 

M  =  eTGye 
G~ 1  =  a2(v) 

■(f*r 
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Analysis  of  Variance 

The  influence  of  external  variables  on  a  measured  random  variable  v  is  in¬ 
vestigated.  One  tries  to  decide  by  means  of  appropriately  constructed  F-tests 
whether  x  is  independent  of  the  external  variables.  Various  models  are  con¬ 
structed  depending  on  the  number  of  external  variables  and  the  assumptions 
concerning  their  influence. 

A  simple  model  is  the  crossed  two-way  classification  with  multiple  observa¬ 
tions.  Two  external  variables  are  used  to  classify  the  observations  into  the 
classes  A/,  Bj  (i  =  1,2,...,/;  j  =  1, 2, . . . ,  J).  Each  class  A,-,  Bj  contains  K 
observations  xijk  (k  =  1,2,...,  K).  One  assumes  the  model 

xi  jk  =  F  +  Uj  +bj  +  ( ab)i  j  +  e(-  jk  , 

where  the  error  of  the  observation  £;jk  is  assumed  to  be  normally  distributed 
about  zero  and  a,,  bj,  and  (ab)jj  are  the  influences  of  the  classifications  in  A, 
B  and  their  interactions.  Three  null  hypotheses 

HqA)  (at  =0,  i  =  1 , . . . ,  /)  ,  HqB^  (bj  =0,  j  =  1 , . . . ,  7)  , 

((ab)ij  =  0,  i  =  1, . . . ,  /,  j  =  1, . . . ,  J) 

can  be  tested  with  the  ratios  F^A\  FiB) ,  FiAB}.  They  are  summarized  in  an 
analysis  of  variance  (ANOVA)  table.  For  other  models  see  Chap.  11. 


Polynomial  Regression 

Problem:  The  true  values  i](t),  for  which  one  has  N  measurements  ytifi ) 
with  normally  distributed  measurement  errors  cr,,  are  to  be  described  by  a 
polynomial  of  order  r  —  1  in  the  controlled  variable  t.  Instead  of  p(f)  — 
x\  +X2 1 H - 1-  xrtr~l  one  writes 

h(t)  =x\f\(t)  +  X2/2OH - \-xrfr(t)  . 

Here  the  fj  are  orthogonal  polynomials  of  order  j  —  1 , 

fj(t)  =  '^,bjktk  , 
k= 1 

whose  coefficients  b jk  are  determined  by  the  orthogonality  conditions 

N 

XI  Si  fj  (h  )  f kit i)  =  8 jk  ,  gi  =  1  jo}  . 
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The  unknowns  xj  are  obtained  by  least  squares  from 

N  [  r 

-E 

i= 1  l  7=1 

The  covariance  matrix  of  the  xj  is  the  r -dimensional  unit  matrix. 


{ ydn ) 


Time  Series  Analysis 

One  is  given  a  series  of  measured  values  Vi(ti),  i  =  1 , ,n,  which  (in  an 
unknown  way)  depend  on  a  controlled  variable  t  (usually  time).  One  treats 
the  V/  as  the  sum  of  a  trend  iy  and  an  error  e, ,  y/  =  iy  +  £,• .  The  measurements 
are  carried  out  at  regular  time  intervals,  i.e.,  ti  —  t,_i  =  const.  In  order  to 
minimize  the  errors  £/,  a  moving  average  is  constructed  for  every  t\  (i  >  k, 
i  <  n  —  k),  by  fitting  a  polynomial  of  order  t  to  the  2k  +  1  measurements 
situated  symmetrically  about  measurement  i .  The  result  of  the  fit  at  the  point 
ti  is  the  moving  average 


r\0(i)  =  a-kyi-k+a-k+iyi-k+i  H - kakyl+k 


The  coefficients  a-k,  ...,ak  are  given  in  Table  13.1  for  low  values  of  k  and  i. 
For  the  beginning  and  end  points  ti  (i  <  k,i  >  n  —  k ),  the  results  of  the  fit  can 
be  used  with  caution  also  for  points  other  than  at  the  center  of  the  interval  of 
the  2k  +  1  measurements. 
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Table  1.1:  Quantiles  fp(k)  of  the  Poisson  distribution. 


k- 1 

p  =  £ 

n= 0 

e~XpPlp/ 

n\ 

k 

0.0005 

0.0010 

0.0050 

P 

0.0100 

0.0250 

0.0500 

0.1000 

i 

7.601 

6.908 

5.298 

4.605 

3.689 

2.996 

2.303 

2 

9.999 

9.233 

7.430 

6.638 

5.572 

4.744 

3.890 

3 

12.051 

11.229 

9.274 

8.406 

7.225 

6.296 

5.322 

4 

13.934 

13.062 

10.977 

10.045 

8.767 

7.754 

6.681 

5 

15.710 

14.794 

12.594 

11.605 

10.242 

9.154 

7.994 

6 

17.411 

16.455 

14.150 

13.108 

11.668 

10.513 

9.275 

7 

19.055 

18.062 

15.660 

14.571 

13.059 

11.842 

10.532 

8 

20.654 

19.626 

17.134 

16.000 

14.423 

13.148 

11.771 

9 

22.217 

21.156 

18.578 

17.403 

15.763 

14.435 

12.995 

10 

23.749 

22.657 

19.998 

18.783 

17.085 

15.705 

14.206 

11 

25.256 

24.134 

21.398 

20.145 

18.390 

16.962 

15.407 

12 

26.739 

25.589 

22.779 

21.490 

19.682 

18.208 

16.598 

13 

28.203 

27.026 

24.145 

22.821 

20.962 

19.443 

17.782 

14 

29.650 

28.446 

25.497 

24.139 

22.230 

20.669 

18.958 

15 

31.081 

29.852 

26.836 

25.446 

23.490 

21.886 

20.128 

16 

32.498 

31.244 

28.164 

26.743 

24.740 

23.097 

21.292 

17 

33.902 

32.624 

29.482 

28.030 

25.983 

24.301 

22.452 

18 

35.294 

33.993 

30.791 

29.310 

27.219 

25.499 

23.606 

19 

36.676 

35.351 

32.091 

30.581 

28.448 

26.692 

24.756 

20 

38.047 

36.701 

33.383 

31.845 

29.671 

27.879 

25.903 

21 

39.410 

38.042 

34.668 

33.103 

30.888 

29.062 

27.045 

22 

40.764 

39.375 

35.946 

34.355 

32.101 

30.240 

28.184 

23 

42.110 

40.700 

37.218 

35.601 

33.308 

31.415 

29.320 

24 

43.449 

42.019 

38.484 

36.841 

34.511 

32.585 

30.453 

25 

44.780 

43.330 

39.745 

38.077 

35.710 

33.752 

31.584 
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Table  1. 1:  (continued) 
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0.9000 

0.9500 

0.9750 

P 

0.9900 

0.9950 

0.9990 

0.9995 

0.105 

0.051 

0.025 

0.010 

0.005 

0.001 

0.001 

0.532 

0.355 

0.242 

0.149 

0.103 

0.045 

0.032 

1.102 

0.818 

0.619 

0.436 

0.338 

0.191 

0.150 

1.745 

1.366 

1.090 

0.823 

0.672 

0.429 

0.355 

2.433 

1.970 

1.623 

1.279 

1.078 

0.739 

0.632 

3.152 

2.613 

2.202 

1.785 

1.537 

1.107 

0.967 

3.895 

3.285 

2.814 

2.330 

2.037 

1.520 

1.348 

4.656 

3.981 

3.454 

2.906 

2.571 

1.971 

1.768 

5.432 

4.695 

4.115 

3.507 

3.132 

2.452 

2.220 

6.221 

5.425 

4.795 

4.130 

3.717 

2.961 

2.699 

7.021 

6.169 

5.491 

4.771 

4.321 

3.491 

3.202 

7.829 

6.924 

6.201 

5.428 

4.943 

4.042 

3.726 

8.646 

7.690 

6.922 

6.099 

5.580 

4.611 

4.269 

9.470 

8.464 

7.654 

6.782 

6.231 

5.195 

4.828 

10.300 

9.246 

8.395 

7.477 

6.893 

5.794 

5.402 

11.135 

10.036 

9.145 

8.181 

7.567 

6.405 

5.990 

11.976 

10.832 

9.903 

8.895 

8.251 

7.028 

6.590 

12.822 

11.634 

10.668 

9.616 

8.943 

7.662 

7.201 

13.671 

12.442 

11.439 

10.346 

9.644 

8.306 

7.822 

14.525 

13.255 

12.217 

11.082 

10.353 

8.958 

8.453 

15.383 

14.072 

12.999 

11.825 

11.069 

9.619 

9.093 

16.244 

14.894 

13.787 

12.574 

11.792 

10.288 

9.741 

17.108 

15.719 

14.580 

13.329 

12.521 

10.964 

10.397 

17.975 

16.549 

15.377 

14.089 

13.255 

11.647 

11.060 

18.844 

17.382 

16.179 

14.853 

13.995 

12.337 

11.730 
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Table  1.2:  Normal  distribution  ^oOO- 


P(x  < 

x)  =  = 

1 

V2tt 

rx 

1  exp(— x 

J — OO 

2/2)dx 

X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

-3.0 

0.001 

0.001 

0.001 

0.001 

0.001 

0.001 

0.001 

0.001 

0.001 

0.001 

-2.9 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.001 

0.001 

0.001 

-2.8 

0.003 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

-2.7 

0.003 

0.003 

0.003 

0.003 

0.003 

0.003 

0.003 

0.003 

0.003 

0.003 

-2.6 

0.005 

0.005 

0.004 

0.004 

0.004 

0.004 

0.004 

0.004 

0.004 

0.004 

-2.5 

0.006 

0.006 

0.006 

0.006 

0.006 

0.005 

0.005 

0.005 

0.005 

0.005 

-2.4 

0.008 

0.008 

0.008 

0.008 

0.007 

0.007 

0.007 

0.007 

0.007 

0.006 

-2.3 

0.011 

0.010 

0.010 

0.010 

0.010 

0.009 

0.009 

0.009 

0.009 

0.008 

-2.2 

0.014 

0.014 

0.013 

0.013 

0.013 

0.012 

0.012 

0.012 

0.011 

0.011 

-2.1 

0.018 

0.017 

0.017 

0.017 

0.016 

0.016 

0.015 

0.015 

0.015 

0.014 

-2.0 

0.023 

0.022 

0.022 

0.021 

0.021 

0.020 

0.020 

0.019 

0.019 

0.018 

-1.9 

0.029 

0.028 

0.027 

0.027 

0.026 

0.026 

0.025 

0.024 

0.024 

0.023 

-1.8 

0.036 

0.035 

0.034 

0.034 

0.033 

0.032 

0.031 

0.031 

0.030 

0.029 

-1.7 

0.045 

0.044 

0.043 

0.042 

0.041 

0.040 

0.039 

0.038 

0.038 

0.037 

-1.6 

0.055 

0.054 

0.053 

0.052 

0.051 

0.049 

0.048 

0.047 

0.046 

0.046 

-1.5 

0.067 

0.066 

0.064 

0.063 

0.062 

0.061 

0.059 

0.058 

0.057 

0.056 

-1.4 

0.081 

0.079 

0.078 

0.076 

0.075 

0.074 

0.072 

0.071 

0.069 

0.068 

-1.3 

0.097 

0.095 

0.093 

0.092 

0.090 

0.089 

0.087 

0.085 

0.084 

0.082 

-1.2 

0.115 

0.113 

0.111 

0.109 

0.107 

0.106 

0.104 

0.102 

0.100 

0.099 

-1.1 

0.136 

0.133 

0.131 

0.129 

0.127 

0.125 

0.123 

0.121 

0.119 

0.117 

-1.0 

0.159 

0.156 

0.154 

0.152 

0.149 

0.147 

0.145 

0.142 

0.140 

0.138 

-0.9 

0.184 

0.181 

0.179 

0.176 

0.174 

0.171 

0.169 

0.166 

0.164 

0.161 

-0.8 

0.212 

0.209 

0.206 

0.203 

0.200 

0.198 

0.195 

0.192 

0.189 

0.187 

-0.7 

0.242 

0.239 

0.236 

0.233 

0.230 

0.227 

0.224 

0.221 

0.218 

0.215 

-0.6 

0.274 

0.271 

0.268 

0.264 

0.261 

0.258 

0.255 

0.251 

0.248 

0.245 

-0.5 

0.309 

0.305 

0.302 

0.298 

0.295 

0.291 

0.288 

0.284 

0.281 

0.278 

-0.4 

0.345 

0.341 

0.337 

0.334 

0.330 

0.326 

0.323 

0.319 

0.316 

0.312 

-0.3 

0.382 

0.378 

0.374 

0.371 

0.367 

0.363 

0.359 

0.356 

0.352 

0.348 

-0.2 

0.421 

0.417 

0.413 

0.409 

0.405 

0.401 

0.397 

0.394 

0.390 

0.386 

-0.1 

0.460 

0.456 

0.452 

0.448 

0.444 

0.440 

0.436 

0.433 

0.429 

0.425 

0.0 

0.500 

0.496 

0.492 

0.488 

0.484 

0.480 

0.476 

0.472 

0.468 

0.464 
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Table  1.2:  (continued) 


P(x 

<  x)  = 

fo(x) 

1  rx 

=  pz-  /  exP(  x 
V  Z7T  J  —  oo 

2/2)dx 

X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.0 

0.500 

0.504 

0.508 

0.512 

0.516 

0.520 

0.524 

0.528 

0.532 

0.536 

0.1 

0.540 

0.544 

0.548 

0.552 

0.556 

0.560 

0.564 

0.567 

0.571 

0.575 

0.2 

0.579 

0.583 

0.587 

0.591 

0.595 

0.599 

0.603 

0.606 

0.610 

0.614 

0.3 

0.618 

0.622 

0.626 

0.629 

0.633 

0.637 

0.641 

0.644 

0.648 

0.652 

0.4 

0.655 

0.659 

0.663 

0.666 

0.670 

0.674 

0.677 

0.681 

0.684 

0.688 

0.5 

0.691 

0.695 

0.698 

0.702 

0.705 

0.709 

0.712 

0.716 

0.719 

0.722 

0.6 

0.726 

0.729 

0.732 

0.736 

0.739 

0.742 

0.745 

0.749 

0.752 

0.755 

0.7 

0.758 

0.761 

0.764 

0.767 

0.770 

0.773 

0.776 

0.779 

0.782 

0.785 

0.8 

0.788 

0.791 

0.794 

0.797 

0.800 

0.802 

0.805 

0.808 

0.811 

0.813 

0.9 

0.816 

0.819 

0.821 

0.824 

0.826 

0.829 

0.831 

0.834 

0.836 

0.839 

1.0 

0.841 

0.844 

0.846 

0.848 

0.851 

0.853 

0.855 

0.858 

0.860 

0.862 

1.1 

0.864 

0.867 

0.869 

0.871 

0.873 

0.875 

0.877 

0.879 

0.881 

0.883 

1.2 

0.885 

0.887 

0.889 

0.891 

0.893 

0.894 

0.896 

0.898 

0.900 

0.901 

1.3 

0.903 

0.905 

0.907 

0.908 

0.910 

0.911 

0.913 

0.915 

0.916 

0.918 

1.4 

0.919 

0.921 

0.922 

0.924 

0.925 

0.926 

0.928 

0.929 

0.931 

0.932 

1.5 

0.933 

0.934 

0.936 

0.937 

0.938 

0.939 

0.941 

0.942 

0.943 

0.944 

1.6 

0.945 

0.946 

0.947 

0.948 

0.949 

0.951 

0.952 

0.953 

0.954 

0.954 

1.7 

0.955 

0.956 

0.957 

0.958 

0.959 

0.960 

0.961 

0.962 

0.962 

0.963 

1.8 

0.964 

0.965 

0.966 

0.966 

0.967 

0.968 

0.969 

0.969 

0.970 

0.971 

1.9 

0.971 

0.972 

0.973 

0.973 

0.974 

0.974 

0.975 

0.976 

0.976 

0.977 

2.0 

0.977 

0.978 

0.978 

0.979 

0.979 

0.980 

0.980 

0.981 

0.981 

0.982 

2.1 

0.982 

0.983 

0.983 

0.983 

0.984 

0.984 

0.985 

0.985 

0.985 

0.986 

2.2 

0.986 

0.986 

0.987 

0.987 

0.987 

0.988 

0.988 

0.988 

0.989 

0.989 

2.3 

0.989 

0.990 

0.990 

0.990 

0.990 

0.991 

0.991 

0.991 

0.991 

0.992 

2.4 

0.992 

0.992 

0.992 

0.992 

0.993 

0.993 

0.993 

0.993 

0.993 

0.994 

2.5 

0.994 

0.994 

0.994 

0.994 

0.994 

0.995 

0.995 

0.995 

0.995 

0.995 

2.6 

0.995 

0.995 

0.996 

0.996 

0.996 

0.996 

0.996 

0.996 

0.996 

0.996 

2.7 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

2.8 

0.997 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

2.9 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.999 

0.999 

0.999 

3.0 

0.999 

0.999 

0.999 

0.999 

0.999 

0.999 

0.999 

0.999 

0.999 

0.999 
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Table  1.3:  Normal  distribution  2^o(x)  —  1. 


P( |x|  < 

x)  =  2xf/o(x) 

-  1  =  - 

1  f 

V2  n  J- 

exp(- 

-x2/2)dx 

X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.0 

0.000 

0.008 

0.016 

0.024 

0.032 

0.040 

0.048 

0.056 

0.064 

0.072 

0.1 

0.080 

0.088 

0.096 

0.103 

0.111 

0.119 

0.127 

0.135 

0.143 

0.151 

0.2 

0.159 

0.166 

0.174 

0.182 

0.190 

0.197 

0.205 

0.213 

0.221 

0.228 

0.3 

0.236 

0.243 

0.251 

0.259 

0.266 

0.274 

0.281 

0.289 

0.296 

0.303 

0.4 

0.311 

0.318 

0.326 

0.333 

0.340 

0.347 

0.354 

0.362 

0.369 

0.376 

0.5 

0.383 

0.390 

0.397 

0.404 

0.411 

0.418 

0.425 

0.431 

0.438 

0.445 

0.6 

0.451 

0.458 

0.465 

0.471 

0.478 

0.484 

0.491 

0.497 

0.503 

0.510 

0.7 

0.516 

0.522 

0.528 

0.535 

0.541 

0.547 

0.553 

0.559 

0.565 

0.570 

0.8 

0.576 

0.582 

0.588 

0.593 

0.599 

0.605 

0.610 

0.616 

0.621 

0.627 

0.9 

0.632 

0.637 

0.642 

0.648 

0.653 

0.658 

0.663 

0.668 

0.673 

0.678 

1.0 

0.683 

0.688 

0.692 

0.697 

0.702 

0.706 

0.711 

0.715 

0.720 

0.724 

1.1 

0.729 

0.733 

0.737 

0.742 

0.746 

0.750 

0.754 

0.758 

0.762 

0.766 

1.2 

0.770 

0.774 

0.778 

0.781 

0.785 

0.789 

0.792 

0.796 

0.799 

0.803 

1.3 

0.806 

0.810 

0.813 

0.816 

0.820 

0.823 

0.826 

0.829 

0.832 

0.835 

1.4 

0.838 

0.841 

0.844 

0.847 

0.850 

0.853 

0.856 

0.858 

0.861 

0.864 

1.5 

0.866 

0.869 

0.871 

0.874 

0.876 

0.879 

0.881 

0.884 

0.886 

0.888 

1.6 

0.890 

0.893 

0.895 

0.897 

0.899 

0.901 

0.903 

0.905 

0.907 

0.909 

1.7 

0.911 

0.913 

0.915 

0.916 

0.918 

0.920 

0.922 

0.923 

0.925 

0.927 

1.8 

0.928 

0.930 

0.931 

0.933 

0.934 

0.936 

0.937 

0.939 

0.940 

0.941 

1.9 

0.943 

0.944 

0.945 

0.946 

0.948 

0.949 

0.950 

0.951 

0.952 

0.953 

2.0 

0.954 

0.956 

0.957 

0.958 

0.959 

0.960 

0.961 

0.962 

0.962 

0.963 

2.1 

0.964 

0.965 

0.966 

0.967 

0.968 

0.968 

0.969 

0.970 

0.971 

0.971 

2.2 

0.972 

0.973 

0.974 

0.974 

0.975 

0.976 

0.976 

0.977 

0.977 

0.978 

2.3 

0.979 

0.979 

0.980 

0.980 

0.981 

0.981 

0.982 

0.982 

0.983 

0.983 

2.4 

0.984 

0.984 

0.984 

0.985 

0.985 

0.986 

0.986 

0.986 

0.987 

0.987 

2.5 

0.988 

0.988 

0.988 

0.989 

0.989 

0.989 

0.990 

0.990 

0.990 

0.990 

2.6 

0.991 

0.991 

0.991 

0.991 

0.992 

0.992 

0.992 

0.992 

0.993 

0.993 

2.7 

0.993 

0.993 

0.993 

0.994 

0.994 

0.994 

0.994 

0.994 

0.995 

0.995 

2.8 

0.995 

0.995 

0.995 

0.995 

0.995 

0.996 

0.996 

0.996 

0.996 

0.996 

2.9 

0.996 

0.996 

0.996 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

0.997 

3.0 

0.997 

0.997 

0.997 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 

0.998 
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Table  1.4:  Quantiles  xp  =  £2(P)  of  the  normal  distribution. 


P  : 

1 

rxp 

1  exp(— xz 

J — OO 

/2)  dx 

p 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.0 

— OO 

-2.33 

-2.05 

-1.88 

-1.75 

-1.64 

-1.55 

-1.48 

-1.41 

-1.34 

0.1 

-1.28 

-1.23 

-1.17 

-1.13 

-1.08 

-1.04 

-0.99 

-0.95 

-0.92 

-0.88 

-0.84 

-0.81 

-0.77 

-0.74 

-0.71 

-0.67 

-0.64 

-0.61 

-0.58 

-0.55 

-0.52 

-0.50 

-0.47 

-0.44 

-0.41 

-0.39 

-0.36 

-0.33 

-0.31 

-0.28 

-0.25 

-0.23 

-0.20 

-0.18 

-0.15 

-0.13 

-0.10 

-0.08 

-0.05 

-0.03 

0.00 

0.03 

0.05 

0.08 

0.10 

0.13 

0.15 

0.18 

0.20 

0.23 

0.25 

0.28 

0.31 

0.33 

0.36 

0.39 

0.41 

0.44 

0.47 

0.50 

0.7 

0.52 

0.55 

0.58 

0.61 

0.64 

0.67 

0.71 

0.74 

0.77 

0.81 

0.8 

0.84 

0.88 

0.92 

0.95 

0.99 

1.04 

1.08 

1.13 

1.17 

1.23 

0.9 

1.28 

1.34 

1.41 

1.48 

1.55 

1.64 

1.75 

1.88 

2.05 

2.33 
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Table  1.5:  Quantiles  xfp  =  £2' (P)  of  the  normal  distribution. 


1  fx'p  ? 

P  —  _  /  exp(  x  /2)dx 

V2tt  }-x’p 

p 

0123456789 

0.0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

0.000  0.013  0.025  0.038  0.050  0.063  0.075  0.088  0.100  0.113 

0.126  0.138  0.151  0.164  0.176  0.189  0.202  0.215  0.228  0.240 

0.253  0.266  0.279  0.292  0.305  0.319  0.332  0.345  0.358  0.372 

0.385  0.399  0.412  0.426  0.440  0.454  0.468  0.482  0.496  0.510 

0.524  0.539  0.553  0.568  0.583  0.598  0.613  0.628  0.643  0.659 

0.674  0.690  0.706  0.722  0.739  0.755  0.772  0.789  0.806  0.824 

0.842  0.860  0.878  0.896  0.915  0.935  0.954  0.974  0.994  1.015 

1.036  1.058  1.080  1.103  1.126  1.150  1.175  1.200  1.227  1.254 

1.282  1.311  1.341  1.372  1.405  1.440  1.476  1.514  1.555  1.598 

1.645  1.695  1.751  1.812  1.881  1.960  2.054  2.170  2.326  2.576 

P 

0123456789 

0.90 

0.91 

0.92 

0.93 

0.94 

0.95 

0.96 

0.97 

0.98 

0.99 

1.645  1.650  1.655  1.660  1.665  1.670  1.675  1.680  1.685  1.690 

1.695  1.701  1.706  1.711  1.717  1.722  1.728  1.734  1.739  1.745 

1.751  1.757  1.762  1.768  1.774  1.780  1.787  1.793  1.799  1.805 

1.812  1.818  1.825  1.832  1.838  1.845  1.852  1.859  1.866  1.873 

1.881  1.888  1.896  1.903  1.911  1.919  1.927  1.935  1.943  1.951 

1.960  1.969  1.977  1.986  1.995  2.005  2.014  2.024  2.034  2.044 

2.054  2.064  2.075  2.086  2.097  2.108  2.120  2.132  2.144  2.157 

2.170  2.183  2.197  2.212  2.226  2.241  2.257  2.273  2.290  2.308 

2.326  2.346  2.366  2.387  2.409  2.432  2.457  2.484  2.512  2.543 

2.576  2.612  2.652  2.697  2.748  2.807  2.878  2.968  3.090  3.291 

P 

0123456789 

0.990 

0.991 

0.992 

0.993 

0.994 

0.995 

0.996 

0.997 

0.998 

0.999 

2.576  2.579  2.583  2.586  2.590  2.594  2.597  2.601  2.605  2.608 

2.612  2.616  2.620  2.624  2.628  2.632  2.636  2.640  2.644  2.648 

2.652  2.656  2.661  2.665  2.669  2.674  2.678  2.683  2.687  2.692 

2.697  2.702  2.706  2.711  2.716  2.721  2.727  2.732  2.737  2.742 

2.748  2.753  2.759  2.765  2.770  2.776  2.782  2.788  2.794  2.801 

2.807  2.814  2.820  2.827  2.834  2.841  2.848  2.855  2.863  2.870 

2.878  2.886  2.894  2.903  2.911  2.920  2.929  2.938  2.948  2.958 

2.968  2.978  2.989  3.000  3.011  3.023  3.036  3.048  3.062  3.076 

3.090  3.105  3.121  3.138  3.156  3.175  3.195  3.216  3.239  3.264 

3.291  3.320  3.353  3.390  3.432  3.481  3.540  3.615  3.719  3.891 
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Table  1.6:  x 2 -distribution  T(x2). 
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Table  1.7:  Quantiles  Xp  of  the  x  ^distribution. 


rx2p 

p  = 

Jo 

/(x2;/)  dx2 

/ 

0.900 

0.950 

P 

0.990 

0.995 

0.999 

1 

2.706 

3.841 

6.635 

7.879 

10.828 

2 

4.605 

5.991 

9.210 

10.597 

13.816 

3 

6.251 

7.815 

11.345 

12.838 

16.266 

4 

7.779 

9.488 

13.277 

14.860 

18.467 

5 

9.236 

11.070 

15.086 

16.750 

20.515 

6 

10.645 

12.592 

16.812 

18.548 

22.458 

7 

12.017 

14.067 

18.475 

20.278 

24.322 

8 

13.362 

15.507 

20.090 

21.955 

26.124 

9 

14.684 

16.919 

21.666 

23.589 

27.877 

10 

15.987 

18.307 

23.209 

25.188 

29.588 

11 

17.275 

19.675 

24.725 

26.757 

31.264 

12 

18.549 

21.026 

26.217 

28.300 

32.909 

13 

19.812 

22.362 

27.688 

29.819 

34.528 

14 

21.064 

23.685 

29.141 

31.319 

36.123 

15 

22.307 

24.996 

30.578 

32.801 

37.697 

16 

23.542 

26.296 

32.000 

34.267 

39.252 

17 

24.769 

27.587 

33.409 

35.718 

40.790 

18 

25.989 

28.869 

34.805 

37.156 

42.312 

19 

27.204 

30.144 

36.191 

38.582 

43.820 

20 

28.412 

31.410 

37.566 

39.997 

45.315 

30 

40.256 

43.773 

50.892 

53.672 

59.703 

40 

51.805 

55.758 

63.691 

66.766 

73.402 

50 

63.167 

67.505 

76.154 

79.490 

86.661 

60 

74.397 

79.082 

88.379 

91.952 

99.607 

70 

85.527 

90.531 

100.425 

104.215 

112.317 

80 

80.000 

101.879 

112.329 

116.321 

124.839 

90 

107.565 

113.145 

124.116 

128.299 

137.208 

100 

118.498 

124.342 

135.807 

140.169 

149.449 
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Table  1.8:  Quantiles  Fp  of  the  F -distribution. 


rFp 

0.900  —  P  —  /  f{F;fi,f2)dF 

Jo 


h 

h 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

l 

39.86 

49.50 

53.59 

55.83 

57.24 

58.20 

58.91 

59.44 

59.86 

60.19 

2 

8.526 

9.000 

9.162 

9.243 

9.293 

9.326 

9.349 

9.367 

9.381 

9.392 

3 

5.538 

5.462 

5.391 

5.343 

5.309 

5.285 

5.266 

5.252 

5.240 

5.230 

4 

4.545 

4.325 

4.191 

4.107 

4.051 

4.010 

3.979 

3.955 

3.936 

3.920 

5 

4.060 

3.780 

3.619 

3.520 

3.453 

3.405 

3.368 

3.339 

3.316 

3.297 

6 

3.776 

3.463 

3.289 

3.181 

3.108 

3.055 

3.014 

2.983 

2.958 

2.937 

7 

3.589 

3.257 

3.074 

2.961 

2.883 

2.827 

2.785 

2.752 

2.725 

2.703 

8 

3.458 

3.113 

2.924 

2.806 

2.726 

2.668 

2.624 

2.589 

2.561 

2.538 

9 

3.360 

3.006 

2.813 

2.693 

2.611 

2.551 

2.505 

2.469 

2.440 

2.416 

10 

3.285 

2.924 

2.728 

2.605 

2.522 

2.461 

2.414 

2.377 

2.347 

2.323 

rFp 

0 

.950  = 

=  P  = 

L 

f(F; 

fufi)dF 

fi 

h 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

l 

161.4 

199.5 

215.7 

224.6 

230.2 

234.0 

236.8 

238.9 

240.5 

241.9 

2 

18.51 

19.00 

19.16 

19.25 

19.30 

19.33 

19.35 

19.37 

19.38 

19.40 

3 

10.13 

9.552 

9.277 

9.117 

9.013 

8.941 

8.887 

8.845 

8.812 

8.786 

4 

7.709 

6.944 

6.591 

6.388 

6.256 

6.163 

6.094 

6.041 

5.999 

5.964 

5 

6.608 

5.786 

5.409 

5.192 

5.050 

4.950 

4.876 

4.818 

4.772 

4.735 

6 

5.987 

5.143 

4.757 

4.534 

4.387 

4.284 

4.207 

4.147 

4.099 

4.060 

7 

5.591 

4.737 

4.347 

4.120 

3.972 

3.866 

3.787 

3.726 

3.677 

3.637 

8 

5.318 

4.459 

4.066 

3.838 

3.687 

3.581 

3.500 

3.438 

3.388 

3.347 

9 

5.117 

4.256 

3.863 

3.633 

3.482 

3.374 

3.293 

3.230 

3.179 

3.137 

10 

4.965 

4.103 

3.708 

3.478 

3.326 

3.217 

3.135 

3.072 

3.020 

2.978 
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Table  1.8:  (continued) 


pFp 

0.975  : 

—  P  — 

l 

f(F;fuf2)dF 

h 

h 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

647.8 

799.5 

864.2 

899.6 

921.8 

937.1 

948.2 

956.7 

963.3 

968.6 

2 

38.51 

39.00 

39.17 

39.25 

39.30 

39.33 

39.36 

39.37 

39.39 

39.40 

3 

17.44 

16.04 

15.44 

15.10 

14.88 

14.73 

14.62 

14.54 

14.47 

14.42 

4 

12.22 

10.65 

9.979 

9.605 

9.364 

9.197 

9.074 

8.980 

8.905 

8.844 

5 

10.01 

8.434 

7.764 

7.388 

7.146 

6.978 

6.853 

6.757 

6.681 

6.619 

6 

8.813 

7.260 

6.599 

6.227 

5.988 

5.820 

5.695 

5.600 

5.523 

5.461 

7 

8.073 

6.542 

5.890 

5.523 

5.285 

5.119 

4.995 

4.899 

4.823 

4.761 

8 

7.571 

6.059 

5.416 

5.053 

4.817 

4.652 

4.529 

4.433 

4.357 

4.295 

9 

7.209 

5.715 

5.078 

4.718 

4.484 

4.320 

4.197 

4.102 

4.026 

3.964 

10 

6.937 

5.456 

4.826 

4.468 

4.236 

4.072 

3.950 

3.855 

3.779 

3.717 
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Table  1.9:  Quantiles  tp  of  Student’s  distribution. 


F=I. 

fit;  f)dt 

-00 

/ 

0.9000 

0.9500 

0.9750 

P 

0.9900 

0.9950 

0.9990 

0.9995 

1 

3.078 

6.314 

12.706 

31.821 

63.657 

318.309 

636.619 

2 

1.886 

2.920 

4.303 

6.965 

9.925 

22.327 

31.599 

3 

1.638 

2.353 

3.182 

4.541 

5.841 

10.215 

12.924 

4 

1.533 

2.132 

2.776 

3.747 

4.604 

7.173 

8.610 

5 

1.476 

2.015 

2.571 

3.365 

4.032 

5.893 

6.869 

6 

1.440 

1.943 

2.447 

3.143 

3.707 

5.208 

5.959 

7 

1.415 

1.895 

2.365 

2.998 

3.499 

4.785 

5.408 

8 

1.397 

1.860 

2.306 

2.896 

3.355 

4.501 

5.041 

9 

1.383 

1.833 

2.262 

2.821 

3.250 

4.297 

4.781 

10 

1.372 

1.812 

2.228 

2.764 

3.169 

4.144 

4.587 

11 

1.363 

1.796 

2.201 

2.718 

3.106 

4.025 

4.437 

12 

1.356 

1.782 

2.179 

2.681 

3.055 

3.930 

4.318 

13 

1.350 

1.771 

2.160 

2.650 

3.012 

3.852 

4.221 

14 

1.345 

1.761 

2.145 

2.624 

2.977 

3.787 

4.140 

15 

1.341 

1.753 

2.131 

2.602 

2.947 

3.733 

4.073 

16 

1.337 

1.746 

2.120 

2.583 

2.921 

3.686 

4.015 

17 

1.333 

1.740 

2.110 

2.567 

2.898 

3.646 

3.965 

18 

1.330 

1.734 

2.101 

2.552 

2.878 

3.610 

3.922 

19 

1.328 

1.729 

2.093 

2.539 

2.861 

3.579 

3.883 

20 

1.325 

1.725 

2.086 

2.528 

2.845 

3.552 

3.850 

30 

1.310 

1.697 

2.042 

2.457 

2.750 

3.385 

3.646 

40 

1.303 

1.684 

2.021 

2.423 

2.704 

3.307 

3.551 

50 

1.299 

1.676 

2.009 

2.403 

2.678 

3.261 

3.496 

60 

1.296 

1.671 

2.000 

2.390 

2.660 

3.232 

3.460 

70 

1.294 

1.667 

1.994 

2.381 

2.648 

3.211 

3.435 

80 

1.292 

1.664 

1.990 

2.374 

2.639 

3.195 

3.416 

90 

1.291 

1.662 

1.987 

2.368 

2.632 

3.183 

3.402 

100 

1.290 

1.660 

1.984 

2.364 

2.626 

3.174 

3.390 

200 

1.286 

1.653 

1.972 

2.345 

2.601 

3.131 

3.340 

500 

1.283 

1.648 

1.965 

2.334 

2.586 

3.107 

3.310 

1000 

1.282 

1.646 

1.962 

2.330 

2.581 

3.098 

3.300 

List  of  Computer  Programs 


AnalysisOfVariance, 318,  319 

E2Di  st  ri  b,  107 

AuxDer,  426,  428 

E2Gr,  443 

AuxDri,  230,  426,428 

E2Lsq, 263 

AuxGrad,426, 428 

E2MaxLike, 173 

AuxHesse,  426, 428 

E2Mi n, 305 

AuxJInputGroup ,427,  429 

E2Mtx,  400 

AuxJNumberlnput, 428,  429 

E2Random, 67 

AuxJRButtonGroup,  427,  429 

E2Reg, 329 

AuxZero ,427,  428 

E2Sample, 150,  443 

Datan Frame,  427,  428 

E2Test, 205 

DatanGraphics,  118,437,  432, 435, 

E2TimSer,  340 

436 

E3Distrib,  107 

DatanMat  r i x, 348, 399 

E3Gr,  444 

DatanRandom, 67 

E3Lsq, 263 

DatanUserFunction,230, 257, 275 

E3Min,  306 

DatanVector, 348,  399 

E3Mtx, 400 

ElOMtx,  403 

E3Random,  68 

ElAnova,  319 

E3Reg,  329 

ElDistrib,  106 

E3Sample, 150,  443 

ElGr,  443 

E3Test, 206 

ElLsq, 263 

E4Gr,  444 

ElMaxLike,  173, 427 

E4Lsq, 264 

ElMi n, 305 

E4Mi n, 306 

ElMtx,  399 

E4Mtx,  400 

ElRandom, 67 

E4Random,  68 

ElReg, 329 

E4Reg,  329 

ElSample, 150 

E4Sample,  150 

ElTest,  205 

E5Gr,  444 

ElTimSer,  340 

E5Lsq, 264 

E2Anova,  319 

E5Mtx,  401 

*The  slanted  numbers  refer  to  the  Appendix. 
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List  of  Computer  Programs 


E5Sample, 150 

LsqPol,  223,  261,263 

E6Gr,  443,  445 

Mi nAsy, 298,  305,  306 

E6Lsq, 264 

Mi nC j  g,  291,  304,  305 

E6Mtx,  401 

MinCombined,  280,  303 

E6Sample, 151 

Mi nCov, 297,  305 

E7G v,  443,  445 

Mi  nDi  r,  280,  281,303 

E7Lsq, 264 

MinEnclose, 280,  303 

E7Mtx,  401 

MinMar,  294,  304,  305 

E7Sample, 151 

MinParab, 303 

E8G  v,  443,  446 

MinPow,  288,  304,  305 

E8Lsq, 265 

MinQdr,  292,  304,  305 

E8Mtx,  402 

MinSim,  283,  303,  305 

E9Lsq, 265 

Regression, 329 

E9Mtx,  402 

SlAnova,  484 

Functi onOnLi ne, 275,  303 

SIDistrib,  472 

Functi onsDemo, 414,  422 

SILsq,  475 

Gamma,  422, 441 

SIMaxLi ke,  473 

Graphi csWi th2DScatte  rDi ag  ram, 

SIMi n,  483 

120,  150,  443 

SIRandom  ,470 

Graphi csWi thDataPoi ntsAndPol yl i ne, 

SIReg,  485 

443 

SITimSer,  485 

Graphi csWi thDataPoi ntsAnd- 

S2Anova,  484 

MultiplePolylines, 

S2Lsq,  476 

443 

S2MaxLike,  473 

Graphi csWi thHi stog  ramAndPol yl i ne, 

S2Mi n,  483 

443 

S2Random ,470 

Graphi csWi thHi stogram,  118, 150, 

S2Reg,  485 

443 

S2TimSer,  486 

Histogram,  149, 150 

S3Lsq,  478 

LsqAsg,  263,  265 

S3Random,  471 

LsqAsm, 243,  262 

S4Lsq,  479 

LsqAsn, 243,  262,  264 

S5Lsq,  479 

LsqGEn, 264 

S6Lsq,  481,  482 

LsqGen, 255,  257,  262,  264,  265 

Sample, 115, 149, 150 

LsqLin,224,  261,263 

Small  Sample,  149, 150 

LsqMar,235,  243,  262,  264 

StatFunct, 414,  427 

LsqNon, 230,  235,  243,  261,  263 

TimeSeries, 340 

Index 


a  posteriori  probability,  153 
acceptance  probability,  188 
addition  of  errors 
quadratic,  105 

alternative  hypothesis,  186,  494 
analysis  of  variance,  307,  500 
model,  312 
one-way,  307 
table,  310 
two-way,  311 

crossed  classification,  483 
nested  classification,  484 

and,  10 
ANOVA,  307 
table,  310 

asymmetric  error,  167,  298 
asymptotically  unbiased  estimator, 
165 

average 

moving,  332,  501 

background,  142 
backward  substitution,  374 
basis,  351 
beta  function,  418 
incomplete,  420 
bias,  157,  494 
bidiagonal  matrix,  350 
binomial  coefficient,  406 ,  418 
binomial  distribution,  70,  409 ,  451 
parameter  estimation,  163 
binomial  theorem,  406 
bit,  42 

bracketing  the  minimum,  275 


Breit-Wigner  distribution,  25,  57,  470 
byte,  43 

Cauchy  distribution,  23 
central  limit  theorem,  90,  491 
characteristic  equation,  378 
X 2 -distribution,  130,  472,  453 ,  510 
quantiles,  511 
X2-test,  199,  206,  455 
Cholesky  decomposition,  372,  401 
Cholesky  inversion,  401 
classification,  307,  500 
crossed,  313 
nested,  313,  316 
one-way,  311 
two-way,  313 
clipping  region,  436 
cofactor,  361 
color  index,  436 
column  space,  352 
column  vector,  348 
combination,  405 
combinatorics,  405 
computing  coordinates,  433 
confidence 

ellipsoid,  100,  240,  297 
interval,  326,  336 
level,  133,  494 
limits,  297,  298 

region,  170,  241,  260,  297,  479 
conjugate  directions,  285 
consistent  estimator,  111 
constrained  measurements,  243,  258 
constraint,  396 , 403 
equations,  244 


*The  slanted  numbers  refer  to  the  Appendix. 
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Index 


contingency  table,  203,  456 
continued  fraction,  418 
contour  line,  438 
controlled  variable,  218 
convolution,  101,  452 ,  491 

of  uniform  distribution  and  normal 
distribution,  472 

of  uniform  distributions,  102,  471 
with  the  normal  distribution,  103 
coordinate  cross 

in  graphics,  440 
correlation  coefficient,  29,  489 
correlation  coefficient  of  a  sample,  473 
counted  number,  137 
covariance,  29,  32,  489 
ellipse,  97,  306 
ellipsoid,  99,  240 
matrix,  297,  489 
weighted,  393 

critical  region,  177,  187,  494 
decile,  22 

degrees  of  freedom,  128,  238 
derivative 

logarithmic,  156 
determinant,  360 
device  coordinates,  433 
diagonal  matrix,  349 
dice,  11 

direct  measurements 

of  different  accuracy,  210 
of  equal  accuracy,  209 
direct  sum,  353 
dispersion,  19 
distribution 

binomial,  70,  409 ,  451 
Breit-Wigner,  51,470 
Breit-Wigner ,  25 
Cauchy,  23 
x2,  412,  453,  510 
quantiles,  511 
F,  413 

quantiles,  512 
frequency,  109 
function,  16,  487 
of  sample,  110 
of  several  variables,  488 
of  several  variables,  30 
of  two  variables,  25 
Gaussian,  86 
hypergeometric,  74,  409 


Lorentz ,  25 

multivariate  normal,  452 
multivariate  normal,  94 
normal,  86,  410,  451,  505,  507 
quantiles,  508,  509 
of  a  continuous  variable,  492 
of  a  discrete  variable,  491 
Poisson,  78,  410,  451 
Polya,  77 

standard  normal,  84,  410 
quantiles,  508,  509 
Student’s,  182,  413 
quantiles,  514 
t,  413 

quantiles,  514 
triangular,  58,  448,  470 
uniform,  22 
unimodal,  21 

efficient  estimator,  111 
eigenvalue,  376 
equation,  376 
eigenvector,  376 
elements,  244 

equations  of  constraint,  238 
error 

asymmetric,  167,  173,  242,  260, 
298,  306,  479 
combination,  164 
of  mean,  113 
of  sample  variance,  113 
of  the  first  kind,  187,  495 
of  the  second  kind,  187,  495 
standard,  89 
statistical,  73,  106,  137 
symmetric,  297 ,  306 
error  bars,  117 
error  function,  411 
error  model  of  Laplace,  92 
error  propagation,  37,  450,  490 
E  space,  186 
estimation,  111 
estimator,  111,  493,  494 

asymptotically  unbiased,  165,  494 
consistent,  111,  493 
efficiency  of,  452 
efficient,  111 
minimum  variance,  161 
unbiased,  111,  49 3 
unique,  494 
event,  8,  487 


Index 
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expectation  value,  17,  31,  488,  489 
experiment,  7 

factorial,  418 
^-distribution,  413 
quantiles,  512 
fit 

of  a  Breit-Wigner- function,  478 
of  a  polynomial,  473 
of  a  power  law,  475 
of  an  arbitrary  linear  function,  224 
of  an  exponential  function,  232 
of  a  Breit-Wigner  function,  480 
of  a  circle  to  points  with 

measurement  errors  in  abscissa 
and  ordinate,  481 
of  a  Gaussian,  231,  263 
of  a  nonlinear  function,  228 
of  a  polynomial,  222,  263 
of  a  proportionality,  224,  263 
of  a  straight  line,  218 
of  a  straight  line  to  points  with 

measurement  errors  in  abscissa 
and  ordinate,  264 
of  a  sum  of  exponential  functions, 
233 

of  a  sum  of  two  Gaussians  and  a 
polynomial,  235,  264 
fluctuation 

statistical,  106 
forward  substitution,  374 
frequency,  106,  490 
frequency  distribution,  109 
F- test,  111 ,  205,  455 
full  width  at  half  maximum  (FWHM), 
119 

functional  equation,  415 
FWHM,  119 

Gabon’s  board,  107 
gamma  function,  415 
incomplete,  420 
Gauss-Markov  theorem,  238 
Gaussian  distribution,  86 
multivariate,  94 
Gaussian  elimination,  367 
with  pivoting,  369 
Givens  transformation,  354 
go  button,  427 
golden  section,  278 


graphics,  431 
class,  431 
workstation,  43 1 

Hessian  matrix,  27 1 
histogram,  117,  454 
bin  width,  117 

determination  of  parameters  from, 
306 

Householder  transformation,  356 ,  400 
hypergeometric  distribution,  74,  409 
hyperplane,  353 
hypothesis,  175,  186 
alternative,  186,  494 
composite,  186,  494 
null,  186,  494 
simple,  186,  494 
test  of,  494 

implementation,  50 
incomplete  beta  function,  420 
incomplete  gamma  function,  420 
independence 
of  events,  1 1 

of  random  variables,  26,  31 
indirect  measurements 
linear  case,  214 
nonlinear  case,  226 
information,  160,  454 

inequality,  157,  161,494 
of  a  sample,  494 
input  group,  427 
interaction,  315 
inverse  matrix,  365 

Jacobian  determinant,  35,  490 

of  an  orthogonal  transformation, 
40 

kernel,  352 

Kronecker  symbol,  128 

Lagrange  function,  248 
Lagrange  multipliers,  126,  247 
Laplace  error  model,  92 
law  of  large  numbers,  73,  490 
LCG,  45 
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least  squares,  209,  362 ,  497 

according  to  Marquardt,  394 
constrained  measurements,  243,  498 
direct  measurements,  209,  499 
general  case,  25 1 ,  498 
indirect  measurements,  214,  226, 
499 

properties  of  the  solution,  236 
with  constraints,  396 ,  403 
with  change  of  scale,  393 
with  weights,  392 
likelihood 

equation,  155,  494 
function,  154,  493 
logarithmic,  155,  49 3 
ratio,  154 

likelihood-ratio  test,  194,  495 
linear  combination,  351 
linear  system  of  equations,  362 
Lorentz  distribution,  25 
lotto,  12 

LR-decomposition,  369 

main  diagonal,  349 
mantissa,  43 
mapping,  352 

marginal  probability  density,  26,  31, 

488 

Marquardt  minimization,  292 
matrix,  348 

addition,  348 
adjoint,  361 , 366 
antisymmetric,  350 
bidiagonal,  350 
diagonal,  349 
equations,  362 
inverse,  365 
main  diagonal,  349 
multiplication 
by  a  constant,  348 
by  a  matrix,  348 
norm,  350 
null,  349 
orthogonal,  354 
product,  348 
pseudo-inverse,  375 
rank,  352 
singular,  352,  361 
subtraction,  348 
symmetric,  350 
transposition,  348 


triangular,  350 
tridiagonal,  350 
unit,  349 

maximum  likelihood,  493 
maximum-likelihood  estimates,  454 
mean,  17 

error,  113 
of  sample,  111 
mean  square,  128 
mean  square  deviation,  128 
median,  21,  488 
minimization,  267 

combined  method  by  Brent,  280 
along  a  direction,  280 
along  chosen  directions,  287 
along  the  coordinate  directions,  284 
bracketing  the  minimum,  275 
by  the  method  of  Powell,  287 
choice  of  method,  295 
errors  in,  296 
examples,  298 
golden  section,  277 
in  the  direction  of  steepest  descent, 
271,288 

Marquardt  procedure,  292 
quadratic  interpolation,  280 
simplex  method,  281 
with  the  quadratic  form,  292 
minimum  variance 
bound,  161 
estimator,  161 
MLCG,  45 
mode,  21,  488 

moments,  18,  27,  31,  487,  489 
about  the  mean,  1 8 
Monte  Carlo  method,  41 
for  simulation,  66 
minimization,  483 
of  integration,  64 
moving  average,  332,  501 

Neyman-Pearson  lemma,  191,  495 
norm 

Euclidian,  350 

normal  distribution,  86,  410,  451,  505, 
507 

multivariate,  63,  94,  452 
quantiles,  508,  509 
standard,  84,  410 
normal  equations,  365 
null  hypothesis,  186,  494 
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null  matrix,  349 
null  space,  352 
null  vector,  349 
number-input  region,  428 

one-sided  test,  176 

operating  characteristic  function,  188, 
495 

orthogonal,  349 
orthogonal  complements,  35 3 
orthogonal  matrix,  354 
orthogonal  polynomials,  500 
orthogonal  transformation,  354 

parabola  through  three  points,  273 
Pascal’s  triangle,  92,  406 
permutation,  405 
permutation  transformation,  359 
pivot,  369 

Poisson  distribution,  78,  410,  451 
parameter  estimation,  162 
quantiles,  503 
Polya  distribution,  77 
polyline,  437 
polymarker,  437 
polynomial 

orthogonal,  500 
polynomial  regression,  500 
population,  109,  492 
portability,  50 
power  function,  188,  495 
precision 

absolute,  44 
relative,  44 
primitive  element,  46 
principal  axes,  378 
principal  axis  transformation,  377 
probability,  487 

a  posteriori,  153 
conditional,  10 
density,  16,  487 
conditional,  26 
joint,  488 

marginal,  26,  31,  488 
of  several  variables,  31 
of  two  variables,  26 
frequency  definition,  9 
total,  10 

pseudo-inverse  matrix,  375 


quadratic  average  of  individual  errors, 
164 

quantile,  22,  488 
quartile,  22 

radio-button  group,  427 
random  component,  332 
random  number  generator,  44 

linear  congruential  (LCG),  45 
multiplicative  linear  congruential 
(MLCG),  45 
random  numbers,  41 

arbitrarily  distributed,  55 

acceptance-rejection  technique, 
59 

transformation  procedures,  56 
normally  distributed,  62 
random  variable,  15,  487 
continuous,  15 
discrete,  15 
rank,  352 

ratio  of  small  numbers  of  events,  144 
with  background,  147 
reduced  variable,  20,  488 
regression,  321 
curve,  325 

polynomial,  325,  500 
representation  of  numbers  in  a  computer, 
42 

row  space,  352 
row  vector,  348 

sample,  74,  109,  492 

correlation  coefficient,  473 
distribution  function,  110 
error 

of  variance,  113 
of  mean,  113 

from  a  bivariate  normal  distribution, 
173 

from  a  continuous  population,  111 
from  finite  population,  127 
from  Gaussian  distribution,  130 
from  partitioned  population,  453 
from  subpopulation,  122 
graphical  representation,  115 
information,  160 
mean,  111,  150,  45 3 
random,  110 
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size,  109 
small,  136 

with  background,  142 
space,  7 

variance,  112,  150 ,453 
scalar,  349 
scalar  product,  349 
scale  factor,  213 
scale  in  graphics,  439 
scatter  diagram,  150 

one-dimensional,  116 
two-dimensional,  120 
seed,  52 

sign  inversion,  359 
signal,  142 

significance  level,  175,  494 
simplex,  282 
singular  matrix,  352,  361 
singular  value,  379 
singular  value  analysis,  380,  383 
singular  value  decomposition,  379,  385, 
401,  402 
skewness,  20,  488 
small  numbers  of  events,  136 
ratio,  144 

with  background,  147 
small  sample,  136 
span,  352 

standard  deviation,  19,  89,  488 
standard  normal  distribution,  410 
statistic,  111,  49 3 
test,  187 

statistical  error,  73,  137 
statistical  test,  175 
steepest  descent,  288 
step  diagram,  117 
Student’s  difference  test,  184 
Student’s  distribution,  182,  413 
quantiles,  514 

Student’s  test,  180,  205,  455 
subpopulation,  122 
subspace,  352 
sum  of  squares,  127,  308 
symmetric  errors,  297 
system  of  equations 
linear,  362 
triangular,  368 
underdetermined,  362 

t -distribution,  182,473 
quantiles,  514 


test,  175,  494,  496 

x2,  199,  206, 455, 495 
likelihood-ratio,  194,  495 
most  powerful,  188,  495 
one-sided,  176 
statistic,  176,  187,495 
Student’s  ,  205 
two-sided,  176 
unbiased,  188,  495 
uniformly  most  powerful,  188 
text  in  plot,  447 
three-door  game,  1 3 
time  series,  331 

analysis,  331,  501 
extrapolation,  485 
discontinuities,  485 
transformation 

Givens,  354,  400 
Householder,  356,  400 
linear,  36 
of  a  vector,  352 
of  variables,  33,  449,  489 
orthogonal,  39,  354 
permutation,  359 
principal  axis,  377 
sign  inversion,  359 
transposition,  348 
trend,  332,  501 

triangular  distribution,  58,  448,  470 

triangular  matrix,  350 

triangular  system  of  equations,  368 

tridiagonal  matrix,  350 

two-sided  test,  176 

2x2  table,  204 

2x2  table  test,  204 

underdetermined  system  of  equations, 
3(52 

uniform  distribution,  22 
unit  matrix,  349 
unit  vector,  350 

variance,  27,  32,  488,  489 
of  sample,  112 
of  an  estimator,  454 
of  a  random  variable,  1 8 
vector,  348 

absolute  value,  350 
components,  348 
norm,  350 
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null,  349 
row,  348 
space,  351 
basis,  351 
closed,  351 
dimension,  351 
transformation,  352 
unit,  350 
vectors 

linearly  dependent,  351 


linearly  independent,  351 
orthonormal,  351 
viewport,  433 

weight  matrix,  215 

weighted  covariance  matrix,  393 

width 

full  at  half  maximum,  119 
Wilks  theorem,  195,  495 
window,  43 3 
world  coordinates,  432 


